* Re: [PATCH v2 bpf-next 0/9] Revamp test_progs as a test running framework
From: Andrii Nakryiko @ 2019-07-28 2:57 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Andrii Nakryiko, bpf, Network Development, Alexei Starovoitov,
Daniel Borkmann, Stanislav Fomichev, Kernel Team
In-Reply-To: <CAEf4BzY3snLh5=qhFo6RNL1RQMcmVhkCiB2s4p57jQcovp5TWw@mail.gmail.com>
On Sat, Jul 27, 2019 at 7:16 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Sat, Jul 27, 2019 at 6:12 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Sat, Jul 27, 2019 at 12:02 PM Andrii Nakryiko <andriin@fb.com> wrote:
> > >
> > > This patch set makes a number of changes to test_progs selftest, which is
> > > a collection of many other tests (and sometimes sub-tests as well), to provide
> > > better testing experience and allow to start convering many individual test
> > > programs under selftests/bpf into a single and convenient test runner.
> >
> > I really like the patches, but something isn't right:
>
> Argh... Uninitialized `int ret` in test__vprintf(). Should be
> initialized to zero, otherwise in some corner cases when log buffer is
> completely full and ret's initial value is sufficiently large negative
> number, it can underflow env.log_cnt, silently skipping one log
> output, and then crashing on next one. You've somehow encountered a
> fascinating series of conditions that I've never stumbled upon running
> my code dozens of times. Fixing, sorry about that!
Ok, I doubt it was that specific bug (even though it is a bug). Turns
out that you can't call vprintf with the same va_list twice, as it
consumes it, so I have to do va_copy() if I might call vprintf
multiple times. So fixing that now.
>
> > #16 raw_tp_writable_reject_nbd_invalid:OK
> > #17 raw_tp_writable_test_run:OK
> > #18 reference_tracking:OK
> > [ 87.715996] test_progs[2200]: segfault at 2f ip 00007f56aeea347b sp
> > 00007ffce9720980 error 4 in libc-2.23.so[7f56aee5b000+198000]
> > [ 87.717316] Code: ff ff 44 89 8d 30 fb ff ff e8 01 74 fd ff 44 8b
> > 8d 30 fb ff ff 4c 8b 85 28 fb ff ff e9 fd eb ff ff 31 c0 48 83 c9 ff
> > 4c 89 df <f2> ae c7 85 28 fb ff ff 00 00 00 00 48 89 c8 48 f7 d0 4c 8f
> > [ 87.719493] audit: type=1701 audit(1564276195.971:5): auid=0 uid=0
> > gid=0 ses=1 subj=kernel pid=2200 comm="test_progs"
> > exe="/data/users/ast/net-next/tools/testing/selftests/bpf/test_progs"
> > sig=11 res=1
> > Segmentation fault (core dumped)
> >
> > Under gdb fault is different:
> > #23 stacktrace_build_id:OK
> > Detaching after fork from child process 2276.
> > Detaching after fork from child process 2278.
> > [ 149.013116] perf: interrupt took too long (6799 > 6713), lowering
> > kernel.perf_event_max_sample_rate to 29000
> > [ 149.014634] perf: interrupt took too long (8511 > 8498), lowering
> > kernel.perf_event_max_sample_rate to 23000
> > [ 149.017038] perf: interrupt took too long (10649 > 10638), lowering
> > kernel.perf_event_max_sample_rate to 18000
> > [ 149.021901] perf: interrupt took too long (13322 > 13311), lowering
> > kernel.perf_event_max_sample_rate to 15000
> > [ 149.042946] perf: interrupt took too long (16660 > 16652), lowering
> > kernel.perf_event_max_sample_rate to 12000
> > Detaching after fork from child process 2279.
> > #24 stacktrace_build_id_nmi:OK
> > #25 stacktrace_map:OK
> > #26 stacktrace_map_raw_tp:OK
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x00007ffff723f47b in vfprintf () from /usr/lib/libc.so.6
> > (gdb) bt
> > #0 0x00007ffff723f47b in vfprintf () from /usr/lib/libc.so.6
> > #1 0x00007ffff72655a9 in vsnprintf () from /usr/lib/libc.so.6
> > #2 0x0000000000403100 in test__vprintf (fmt=0x426754 "%s:PASS:%s %d
> > nsec\n", args=0x7fffffffe878) at test_progs.c:114
> > #3 0x000000000040325c in test__printf (fmt=fmt@entry=0x426754
> > "%s:PASS:%s %d nsec\n") at test_progs.c:147
> > #4 0x000000000042222d in test_task_fd_query_rawtp () at
> > prog_tests/task_fd_query_rawtp.c:19
> > #5 0x0000000000402c76 in main (argc=<optimized out>, argv=<optimized
> > out>) at test_progs.c:501
> > (gdb) info threads
> > Id Target Id Frame
> > * 1 Thread 0x7ffff7fea700 (LWP 2245) "test_progs"
> > 0x00007ffff723f47b in vfprintf () from /usr/lib/libc.so.6
^ permalink raw reply
* [PATCH v3 bpf-next 0/9] Revamp test_progs as a test running framework
From: Andrii Nakryiko @ 2019-07-28 3:25 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, sdf
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
This patch set makes a number of changes to test_progs selftest, which is
a collection of many other tests (and sometimes sub-tests as well), to provide
better testing experience and allow to start convering many individual test
programs under selftests/bpf into a single and convenient test runner.
Patch #1 fixes issue with Makefile, which makes prog_tests/test.h compiled as
a C code. This fix allows to change how test.h is generated, providing ability
to have more control on what and how tests are run.
Patch #2 changes how test.h is auto-generated, which allows to have test
definitions, instead of just running test functions. This gives ability to do
more complicated test run policies.
Patch #3 adds `-t <test-name>` and `-n <test-num>` selectors to run only
subset of tests.
Patch #4 changes libbpf_set_print() to return previously set print callback,
allowing to temporarily replace current print callback and then set it back.
This is necessary for some tests that want more control over libbpf logging.
Patch #5 sets up and takes over libbpf logging from individual tests to
test_prog runner, adding -vv verbosity to capture debug output from libbpf.
This is useful when debugging failing tests.
Patch #6 furthers test output management and buffers it by default, emitting
log output only if test fails. This give succinct and clean default test
output. It's possible to bypass this behavior with -v flag, which will turn
off test output buffering.
Patch #7 adds support for sub-tests. It also enhances -t and -n selectors to
both support ability to specify sub-test selectors, as well as enhancing
number selector to accept sets of test, instead of just individual test
number.
Patch #8 converts bpf_verif_scale.c test to use sub-test APIs.
Patch #9 converts send_signal.c tests to use sub-test APIs.
v2->v3:
- fix buffered output rare unitialized value bug (Alexei);
- fix buffered output va_list reuse bug (Alexei);
- fix buffered output truncation due to interleaving zero terminators;
v1->v2:
- drop libbpf_swap_print, instead return previous function from
libbpf_set_print (Stanislav);
Andrii Nakryiko (9):
selftests/bpf: prevent headers to be compiled as C code
selftests/bpf: revamp test_progs to allow more control
selftests/bpf: add test selectors by number and name to test_progs
libbpf: return previous print callback from libbpf_set_print
selftest/bpf: centralize libbpf logging management for test_progs
selftests/bpf: abstract away test log output
selftests/bpf: add sub-tests support for test_progs
selftests/bpf: convert bpf_verif_scale.c to sub-tests API
selftests/bpf: convert send_signal.c to use subtests
tools/lib/bpf/libbpf.c | 5 +-
tools/lib/bpf/libbpf.h | 2 +-
tools/testing/selftests/bpf/Makefile | 14 +-
.../selftests/bpf/prog_tests/bpf_obj_id.c | 6 +-
.../bpf/prog_tests/bpf_verif_scale.c | 90 ++--
.../bpf/prog_tests/get_stack_raw_tp.c | 4 +-
.../selftests/bpf/prog_tests/l4lb_all.c | 2 +-
.../selftests/bpf/prog_tests/map_lock.c | 10 +-
.../bpf/prog_tests/reference_tracking.c | 15 +-
.../selftests/bpf/prog_tests/send_signal.c | 17 +-
.../selftests/bpf/prog_tests/spinlock.c | 2 +-
.../bpf/prog_tests/stacktrace_build_id.c | 4 +-
.../bpf/prog_tests/stacktrace_build_id_nmi.c | 4 +-
.../selftests/bpf/prog_tests/xdp_noinline.c | 3 +-
tools/testing/selftests/bpf/test_progs.c | 383 +++++++++++++++++-
tools/testing/selftests/bpf/test_progs.h | 45 +-
16 files changed, 502 insertions(+), 104 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH v3 bpf-next 1/9] selftests/bpf: prevent headers to be compiled as C code
From: Andrii Nakryiko @ 2019-07-28 3:25 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, sdf
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190728032531.2358749-1-andriin@fb.com>
Apprently listing header as a normal dependency for a binary output
makes it go through compilation as if it was C code. This currently
works without a problem, but in subsequent commits causes problems for
differently generated test.h for test_progs. Marking those headers as
order-only dependency solves the issue.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
tools/testing/selftests/bpf/Makefile | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 11c9c62c3362..bb66cc4a7f34 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -235,7 +235,7 @@ PROG_TESTS_H := $(PROG_TESTS_DIR)/tests.h
PROG_TESTS_FILES := $(wildcard prog_tests/*.c)
test_progs.c: $(PROG_TESTS_H)
$(OUTPUT)/test_progs: CFLAGS += $(TEST_PROGS_CFLAGS)
-$(OUTPUT)/test_progs: test_progs.c $(PROG_TESTS_H) $(PROG_TESTS_FILES)
+$(OUTPUT)/test_progs: test_progs.c $(PROG_TESTS_FILES) | $(PROG_TESTS_H)
$(PROG_TESTS_H): $(PROG_TESTS_FILES) | $(PROG_TESTS_DIR)
$(shell ( cd prog_tests/; \
echo '/* Generated header, do not edit */'; \
@@ -256,7 +256,7 @@ MAP_TESTS_H := $(MAP_TESTS_DIR)/tests.h
MAP_TESTS_FILES := $(wildcard map_tests/*.c)
test_maps.c: $(MAP_TESTS_H)
$(OUTPUT)/test_maps: CFLAGS += $(TEST_MAPS_CFLAGS)
-$(OUTPUT)/test_maps: test_maps.c $(MAP_TESTS_H) $(MAP_TESTS_FILES)
+$(OUTPUT)/test_maps: test_maps.c $(MAP_TESTS_FILES) | $(MAP_TESTS_H)
$(MAP_TESTS_H): $(MAP_TESTS_FILES) | $(MAP_TESTS_DIR)
$(shell ( cd map_tests/; \
echo '/* Generated header, do not edit */'; \
@@ -277,7 +277,7 @@ VERIFIER_TESTS_H := $(VERIFIER_TESTS_DIR)/tests.h
VERIFIER_TEST_FILES := $(wildcard verifier/*.c)
test_verifier.c: $(VERIFIER_TESTS_H)
$(OUTPUT)/test_verifier: CFLAGS += $(TEST_VERIFIER_CFLAGS)
-$(OUTPUT)/test_verifier: test_verifier.c $(VERIFIER_TESTS_H)
+$(OUTPUT)/test_verifier: test_verifier.c | $(VERIFIER_TEST_FILES) $(VERIFIER_TESTS_H)
$(VERIFIER_TESTS_H): $(VERIFIER_TEST_FILES) | $(VERIFIER_TESTS_DIR)
$(shell ( cd verifier/; \
echo '/* Generated header, do not edit */'; \
--
2.17.1
^ permalink raw reply related
* [PATCH v3 bpf-next 4/9] libbpf: return previous print callback from libbpf_set_print
From: Andrii Nakryiko @ 2019-07-28 3:25 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, sdf
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190728032531.2358749-1-andriin@fb.com>
By returning previously set print callback from libbpf_set_print, it's
possible to restore it, eventually. This is useful when running many
independent test with one default print function, but overriding log
verbosity for particular subset of tests.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
tools/lib/bpf/libbpf.c | 5 ++++-
tools/lib/bpf/libbpf.h | 2 +-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8741c39adb1c..ead915aec349 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -74,9 +74,12 @@ static int __base_pr(enum libbpf_print_level level, const char *format,
static libbpf_print_fn_t __libbpf_pr = __base_pr;
-void libbpf_set_print(libbpf_print_fn_t fn)
+libbpf_print_fn_t libbpf_set_print(libbpf_print_fn_t fn)
{
+ libbpf_print_fn_t old_print_fn = __libbpf_pr;
+
__libbpf_pr = fn;
+ return old_print_fn;
}
__printf(2, 3)
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 5cbf459ece0b..8a9d462a6f6d 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -57,7 +57,7 @@ enum libbpf_print_level {
typedef int (*libbpf_print_fn_t)(enum libbpf_print_level level,
const char *, va_list ap);
-LIBBPF_API void libbpf_set_print(libbpf_print_fn_t fn);
+LIBBPF_API libbpf_print_fn_t libbpf_set_print(libbpf_print_fn_t fn);
/* Hide internal to user */
struct bpf_object;
--
2.17.1
^ permalink raw reply related
* [PATCH v3 bpf-next 2/9] selftests/bpf: revamp test_progs to allow more control
From: Andrii Nakryiko @ 2019-07-28 3:25 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, sdf
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190728032531.2358749-1-andriin@fb.com>
Refactor test_progs to allow better control on what's being run.
Also use argp to do argument parsing, so that it's easier to keep adding
more options.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
tools/testing/selftests/bpf/Makefile | 8 +--
tools/testing/selftests/bpf/test_progs.c | 84 +++++++++++++++++++++---
2 files changed, 77 insertions(+), 15 deletions(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index bb66cc4a7f34..3bd0f4a0336a 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -239,14 +239,8 @@ $(OUTPUT)/test_progs: test_progs.c $(PROG_TESTS_FILES) | $(PROG_TESTS_H)
$(PROG_TESTS_H): $(PROG_TESTS_FILES) | $(PROG_TESTS_DIR)
$(shell ( cd prog_tests/; \
echo '/* Generated header, do not edit */'; \
- echo '#ifdef DECLARE'; \
ls *.c 2> /dev/null | \
- sed -e 's@\([^\.]*\)\.c@extern void test_\1(void);@'; \
- echo '#endif'; \
- echo '#ifdef CALL'; \
- ls *.c 2> /dev/null | \
- sed -e 's@\([^\.]*\)\.c@test_\1();@'; \
- echo '#endif' \
+ sed -e 's@\([^\.]*\)\.c@DEFINE_TEST(\1)@'; \
) > $(PROG_TESTS_H))
MAP_TESTS_DIR = $(OUTPUT)/map_tests
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index dae0819b1141..eea88ba59225 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -3,6 +3,7 @@
*/
#include "test_progs.h"
#include "bpf_rlimit.h"
+#include <argp.h>
int error_cnt, pass_cnt;
bool jit_enabled;
@@ -156,22 +157,89 @@ void *spin_lock_thread(void *arg)
pthread_exit(arg);
}
-#define DECLARE
+/* extern declarations for test funcs */
+#define DEFINE_TEST(name) extern void test_##name();
#include <prog_tests/tests.h>
-#undef DECLARE
+#undef DEFINE_TEST
-int main(int ac, char **av)
+struct prog_test_def {
+ const char *test_name;
+ void (*run_test)(void);
+};
+
+static struct prog_test_def prog_test_defs[] = {
+#define DEFINE_TEST(name) { \
+ .test_name = #name, \
+ .run_test = &test_##name, \
+},
+#include <prog_tests/tests.h>
+#undef DEFINE_TEST
+};
+
+const char *argp_program_version = "test_progs 0.1";
+const char *argp_program_bug_address = "<bpf@vger.kernel.org>";
+const char argp_program_doc[] = "BPF selftests test runner";
+
+enum ARG_KEYS {
+ ARG_VERIFIER_STATS = 's',
+};
+
+static const struct argp_option opts[] = {
+ { "verifier-stats", ARG_VERIFIER_STATS, NULL, 0,
+ "Output verifier statistics", },
+ {},
+};
+
+struct test_env {
+ bool verifier_stats;
+};
+
+static struct test_env env = {};
+
+static error_t parse_arg(int key, char *arg, struct argp_state *state)
{
+ struct test_env *env = state->input;
+
+ switch (key) {
+ case ARG_VERIFIER_STATS:
+ env->verifier_stats = true;
+ break;
+ case ARGP_KEY_ARG:
+ argp_usage(state);
+ break;
+ case ARGP_KEY_END:
+ break;
+ default:
+ return ARGP_ERR_UNKNOWN;
+ }
+ return 0;
+}
+
+
+int main(int argc, char **argv)
+{
+ static const struct argp argp = {
+ .options = opts,
+ .parser = parse_arg,
+ .doc = argp_program_doc,
+ };
+ const struct prog_test_def *def;
+ int err, i;
+
+ err = argp_parse(&argp, argc, argv, 0, NULL, &env);
+ if (err)
+ return err;
+
srand(time(NULL));
jit_enabled = is_jit_enabled();
- if (ac == 2 && strcmp(av[1], "-s") == 0)
- verifier_stats = true;
+ verifier_stats = env.verifier_stats;
-#define CALL
-#include <prog_tests/tests.h>
-#undef CALL
+ for (i = 0; i < ARRAY_SIZE(prog_test_defs); i++) {
+ def = &prog_test_defs[i];
+ def->run_test();
+ }
printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
--
2.17.1
^ permalink raw reply related
* [PATCH v3 bpf-next 5/9] selftest/bpf: centralize libbpf logging management for test_progs
From: Andrii Nakryiko @ 2019-07-28 3:25 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, sdf
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190728032531.2358749-1-andriin@fb.com>
Make test_progs test runner own libbpf logging. Also introduce two
levels of verbosity: -v and -vv. First one will be used in subsequent
patches to enable test log output always. Second one increases verbosity
level of libbpf logging further to include debug output as well.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
.../bpf/prog_tests/bpf_verif_scale.c | 6 +++-
.../bpf/prog_tests/reference_tracking.c | 15 +++-------
tools/testing/selftests/bpf/test_progs.c | 29 +++++++++++++++++++
3 files changed, 38 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
index e1b55261526f..ceddb8cc86f4 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
@@ -70,10 +70,11 @@ void test_bpf_verif_scale(void)
const char *cg_sysctl[] = {
"./test_sysctl_loop1.o", "./test_sysctl_loop2.o",
};
+ libbpf_print_fn_t old_print_fn = NULL;
int err, i;
if (verifier_stats)
- libbpf_set_print(libbpf_debug_print);
+ old_print_fn = libbpf_set_print(libbpf_debug_print);
err = check_load("./loop3.o", BPF_PROG_TYPE_RAW_TRACEPOINT);
printf("test_scale:loop3:%s\n", err ? (error_cnt--, "OK") : "FAIL");
@@ -97,4 +98,7 @@ void test_bpf_verif_scale(void)
err = check_load("./test_seg6_loop.o", BPF_PROG_TYPE_LWT_SEG6LOCAL);
printf("test_scale:test_seg6_loop:%s\n", err ? "FAIL" : "OK");
+
+ if (verifier_stats)
+ libbpf_set_print(old_print_fn);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/reference_tracking.c b/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
index 5633be43828f..4a4f428d1a78 100644
--- a/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
+++ b/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
@@ -1,15 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
-static int libbpf_debug_print(enum libbpf_print_level level,
- const char *format, va_list args)
-{
- if (level == LIBBPF_DEBUG)
- return 0;
-
- return vfprintf(stderr, format, args);
-}
-
void test_reference_tracking(void)
{
const char *file = "./test_sk_lookup_kern.o";
@@ -36,9 +27,11 @@ void test_reference_tracking(void)
/* Expect verifier failure if test name has 'fail' */
if (strstr(title, "fail") != NULL) {
- libbpf_set_print(NULL);
+ libbpf_print_fn_t old_print_fn;
+
+ old_print_fn = libbpf_set_print(NULL);
err = !bpf_program__load(prog, "GPL", 0);
- libbpf_set_print(libbpf_debug_print);
+ libbpf_set_print(old_print_fn);
} else {
err = bpf_program__load(prog, "GPL", 0);
}
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 6e04b9f83777..94b6951b90b3 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -186,6 +186,8 @@ enum ARG_KEYS {
ARG_TEST_NUM = 'n',
ARG_TEST_NAME = 't',
ARG_VERIFIER_STATS = 's',
+
+ ARG_VERBOSE = 'v',
};
static const struct argp_option opts[] = {
@@ -195,6 +197,8 @@ static const struct argp_option opts[] = {
"Run tests with names containing NAME" },
{ "verifier-stats", ARG_VERIFIER_STATS, NULL, 0,
"Output verifier statistics", },
+ { "verbose", ARG_VERBOSE, "LEVEL", OPTION_ARG_OPTIONAL,
+ "Verbose output (use -vv for extra verbose output)" },
{},
};
@@ -202,12 +206,22 @@ struct test_env {
int test_num_selector;
const char *test_name_selector;
bool verifier_stats;
+ bool verbose;
+ bool very_verbose;
};
static struct test_env env = {
.test_num_selector = -1,
};
+static int libbpf_print_fn(enum libbpf_print_level level,
+ const char *format, va_list args)
+{
+ if (!env.very_verbose && level == LIBBPF_DEBUG)
+ return 0;
+ return vfprintf(stderr, format, args);
+}
+
static error_t parse_arg(int key, char *arg, struct argp_state *state)
{
struct test_env *env = state->input;
@@ -229,6 +243,19 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state)
case ARG_VERIFIER_STATS:
env->verifier_stats = true;
break;
+ case ARG_VERBOSE:
+ if (arg) {
+ if (strcmp(arg, "v") == 0) {
+ env->very_verbose = true;
+ } else {
+ fprintf(stderr,
+ "Unrecognized verbosity setting ('%s'), only -v and -vv are supported\n",
+ arg);
+ return -EINVAL;
+ }
+ }
+ env->verbose = true;
+ break;
case ARGP_KEY_ARG:
argp_usage(state);
break;
@@ -255,6 +282,8 @@ int main(int argc, char **argv)
if (err)
return err;
+ libbpf_set_print(libbpf_print_fn);
+
srand(time(NULL));
jit_enabled = is_jit_enabled();
--
2.17.1
^ permalink raw reply related
* [PATCH v3 bpf-next 3/9] selftests/bpf: add test selectors by number and name to test_progs
From: Andrii Nakryiko @ 2019-07-28 3:25 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, sdf
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190728032531.2358749-1-andriin@fb.com>
Add ability to specify either test number or test name substring to
narrow down a set of test to run.
Usage:
sudo ./test_progs -n 1
sudo ./test_progs -t attach_probe
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
tools/testing/selftests/bpf/test_progs.c | 43 +++++++++++++++++++++---
1 file changed, 39 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index eea88ba59225..6e04b9f83777 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -4,6 +4,7 @@
#include "test_progs.h"
#include "bpf_rlimit.h"
#include <argp.h>
+#include <string.h>
int error_cnt, pass_cnt;
bool jit_enabled;
@@ -164,6 +165,7 @@ void *spin_lock_thread(void *arg)
struct prog_test_def {
const char *test_name;
+ int test_num;
void (*run_test)(void);
};
@@ -181,26 +183,49 @@ const char *argp_program_bug_address = "<bpf@vger.kernel.org>";
const char argp_program_doc[] = "BPF selftests test runner";
enum ARG_KEYS {
+ ARG_TEST_NUM = 'n',
+ ARG_TEST_NAME = 't',
ARG_VERIFIER_STATS = 's',
};
static const struct argp_option opts[] = {
+ { "num", ARG_TEST_NUM, "NUM", 0,
+ "Run test number NUM only " },
+ { "name", ARG_TEST_NAME, "NAME", 0,
+ "Run tests with names containing NAME" },
{ "verifier-stats", ARG_VERIFIER_STATS, NULL, 0,
"Output verifier statistics", },
{},
};
struct test_env {
+ int test_num_selector;
+ const char *test_name_selector;
bool verifier_stats;
};
-static struct test_env env = {};
+static struct test_env env = {
+ .test_num_selector = -1,
+};
static error_t parse_arg(int key, char *arg, struct argp_state *state)
{
struct test_env *env = state->input;
switch (key) {
+ case ARG_TEST_NUM: {
+ int test_num;
+
+ errno = 0;
+ test_num = strtol(arg, NULL, 10);
+ if (errno)
+ return -errno;
+ env->test_num_selector = test_num;
+ break;
+ }
+ case ARG_TEST_NAME:
+ env->test_name_selector = arg;
+ break;
case ARG_VERIFIER_STATS:
env->verifier_stats = true;
break;
@@ -223,7 +248,7 @@ int main(int argc, char **argv)
.parser = parse_arg,
.doc = argp_program_doc,
};
- const struct prog_test_def *def;
+ struct prog_test_def *test;
int err, i;
err = argp_parse(&argp, argc, argv, 0, NULL, &env);
@@ -237,8 +262,18 @@ int main(int argc, char **argv)
verifier_stats = env.verifier_stats;
for (i = 0; i < ARRAY_SIZE(prog_test_defs); i++) {
- def = &prog_test_defs[i];
- def->run_test();
+ test = &prog_test_defs[i];
+
+ test->test_num = i + 1;
+
+ if (env.test_num_selector >= 0 &&
+ test->test_num != env.test_num_selector)
+ continue;
+ if (env.test_name_selector &&
+ !strstr(test->test_name, env.test_name_selector))
+ continue;
+
+ test->run_test();
}
printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
--
2.17.1
^ permalink raw reply related
* [PATCH v3 bpf-next 7/9] selftests/bpf: add sub-tests support for test_progs
From: Andrii Nakryiko @ 2019-07-28 3:25 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, sdf
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190728032531.2358749-1-andriin@fb.com>
Allow tests to have their own set of sub-tests. Also add ability to do
test/subtest selection using `-t <test-name>/<subtest-name>` and `-n
<test-nums-set>/<subtest-nums-set>`, as an extension of existing -t/-n
selector options. For the <test-num-set> format: it's a comma-separated
list of either individual test numbers (1-based), or range of test
numbers. E.g., all of the following are valid sets of test numbers:
- 10
- 1,2,3
- 1-3
- 5-10,1,3-4
'/<subtest' part is optional, but has the same format. E.g., to select
test #3 and its sub-tests #10 through #15, use: -t 3/10-15.
Similarly, to select tests by name, use `-t verif/strobe`:
$ sudo ./test_progs -t verif/strobe
#3/12 strobemeta.o:OK
#3/13 strobemeta_nounroll1.o:OK
#3/14 strobemeta_nounroll2.o:OK
#3 bpf_verif_scale:OK
Summary: 1/3 PASSED, 0 FAILED
Example of using subtest API is in the next patch, converting
bpf_verif_scale.c tests to use sub-tests.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
tools/testing/selftests/bpf/test_progs.c | 198 ++++++++++++++++++++---
tools/testing/selftests/bpf/test_progs.h | 16 +-
2 files changed, 185 insertions(+), 29 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 1b7470d3da22..546d99b3ec34 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -7,9 +7,7 @@
#include <string.h>
/* defined in test_progs.h */
-struct test_env env = {
- .test_num_selector = -1,
-};
+struct test_env env;
int error_cnt, pass_cnt;
struct prog_test_def {
@@ -20,8 +18,82 @@ struct prog_test_def {
int pass_cnt;
int error_cnt;
bool tested;
+
+ const char *subtest_name;
+ int subtest_num;
+
+ /* store counts before subtest started */
+ int old_pass_cnt;
+ int old_error_cnt;
};
+static bool should_run(struct test_selector *sel, int num, const char *name)
+{
+ if (sel->name && sel->name[0] && !strstr(name, sel->name))
+ return false;
+
+ if (!sel->num_set)
+ return true;
+
+ return num < sel->num_set_len && sel->num_set[num];
+}
+
+static void dump_test_log(const struct prog_test_def *test, bool failed)
+{
+ if (env.verbose || test->force_log || failed) {
+ if (env.log_cnt) {
+ fprintf(stdout, "%s", env.log_buf);
+ if (env.log_buf[env.log_cnt - 1] != '\n')
+ fprintf(stdout, "\n");
+ }
+ env.log_cnt = 0;
+ }
+}
+
+void test__end_subtest()
+{
+ struct prog_test_def *test = env.test;
+ int sub_error_cnt = error_cnt - test->old_error_cnt;
+
+ if (sub_error_cnt)
+ env.fail_cnt++;
+ else
+ env.sub_succ_cnt++;
+
+ dump_test_log(test, sub_error_cnt);
+
+ printf("#%d/%d %s:%s\n",
+ test->test_num, test->subtest_num,
+ test->subtest_name, sub_error_cnt ? "FAIL" : "OK");
+}
+
+bool test__start_subtest(const char *name)
+{
+ struct prog_test_def *test = env.test;
+
+ if (test->subtest_name) {
+ test__end_subtest();
+ test->subtest_name = NULL;
+ }
+
+ test->subtest_num++;
+
+ if (!name || !name[0]) {
+ fprintf(stderr, "Subtest #%d didn't provide sub-test name!\n",
+ test->subtest_num);
+ return false;
+ }
+
+ if (!should_run(&env.subtest_selector, test->subtest_num, name))
+ return false;
+
+ test->subtest_name = name;
+ env.test->old_pass_cnt = pass_cnt;
+ env.test->old_error_cnt = error_cnt;
+
+ return true;
+}
+
void test__force_log() {
env.test->force_log = true;
}
@@ -281,24 +353,103 @@ static int libbpf_print_fn(enum libbpf_print_level level,
return 0;
}
+int parse_num_list(const char *s, struct test_selector *sel)
+{
+ int i, set_len = 0, num, start = 0, end = -1;
+ bool *set = NULL, *tmp, parsing_end = false;
+ char *next;
+
+ while (s[0]) {
+ errno = 0;
+ num = strtol(s, &next, 10);
+ if (errno)
+ return -errno;
+
+ if (parsing_end)
+ end = num;
+ else
+ start = num;
+
+ if (!parsing_end && *next == '-') {
+ s = next + 1;
+ parsing_end = true;
+ continue;
+ } else if (*next == ',') {
+ parsing_end = false;
+ s = next + 1;
+ end = num;
+ } else if (*next == '\0') {
+ parsing_end = false;
+ s = next;
+ end = num;
+ } else {
+ return -EINVAL;
+ }
+
+ if (start > end)
+ return -EINVAL;
+
+ if (end + 1 > set_len) {
+ set_len = end + 1;
+ tmp = realloc(set, set_len);
+ if (!tmp) {
+ free(set);
+ return -ENOMEM;
+ }
+ set = tmp;
+ }
+ for (i = start; i <= end; i++) {
+ set[i] = true;
+ }
+
+ }
+
+ if (!set)
+ return -EINVAL;
+
+ sel->num_set = set;
+ sel->num_set_len = set_len;
+
+ return 0;
+}
+
static error_t parse_arg(int key, char *arg, struct argp_state *state)
{
struct test_env *env = state->input;
switch (key) {
case ARG_TEST_NUM: {
- int test_num;
+ char *subtest_str = strchr(arg, '/');
- errno = 0;
- test_num = strtol(arg, NULL, 10);
- if (errno)
- return -errno;
- env->test_num_selector = test_num;
+ if (subtest_str) {
+ *subtest_str = '\0';
+ if (parse_num_list(subtest_str + 1,
+ &env->subtest_selector)) {
+ fprintf(stderr,
+ "Failed to parse subtest numbers.\n");
+ return -EINVAL;
+ }
+ }
+ if (parse_num_list(arg, &env->test_selector)) {
+ fprintf(stderr, "Failed to parse test numbers.\n");
+ return -EINVAL;
+ }
break;
}
- case ARG_TEST_NAME:
- env->test_name_selector = arg;
+ case ARG_TEST_NAME: {
+ char *subtest_str = strchr(arg, '/');
+
+ if (subtest_str) {
+ *subtest_str = '\0';
+ env->subtest_selector.name = strdup(subtest_str + 1);
+ if (!env->subtest_selector.name)
+ return -ENOMEM;
+ }
+ env->test_selector.name = strdup(arg);
+ if (!env->test_selector.name)
+ return -ENOMEM;
break;
+ }
case ARG_VERIFIER_STATS:
env->verifier_stats = true;
break;
@@ -353,14 +504,15 @@ int main(int argc, char **argv)
env.test = test;
test->test_num = i + 1;
- if (env.test_num_selector >= 0 &&
- test->test_num != env.test_num_selector)
- continue;
- if (env.test_name_selector &&
- !strstr(test->test_name, env.test_name_selector))
+ if (!should_run(&env.test_selector,
+ test->test_num, test->test_name))
continue;
test->run_test();
+ /* ensure last sub-test is finalized properly */
+ if (test->subtest_name)
+ test__end_subtest();
+
test->tested = true;
test->pass_cnt = pass_cnt - old_pass_cnt;
test->error_cnt = error_cnt - old_error_cnt;
@@ -369,21 +521,17 @@ int main(int argc, char **argv)
else
env.succ_cnt++;
- if (env.verbose || test->force_log || test->error_cnt) {
- if (env.log_cnt) {
- fprintf(stdout, "%s", env.log_buf);
- if (env.log_buf[env.log_cnt - 1] != '\n')
- fprintf(stdout, "\n");
- }
- }
- env.log_cnt = 0;
+ dump_test_log(test, test->error_cnt);
printf("#%d %s:%s\n", test->test_num, test->test_name,
test->error_cnt ? "FAIL" : "OK");
}
- printf("Summary: %d PASSED, %d FAILED\n", env.succ_cnt, env.fail_cnt);
+ printf("Summary: %d/%d PASSED, %d FAILED\n",
+ env.succ_cnt, env.sub_succ_cnt, env.fail_cnt);
free(env.log_buf);
+ free(env.test_selector.num_set);
+ free(env.subtest_selector.num_set);
return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
}
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 62f55a4231e9..afd14962456f 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -40,9 +40,15 @@ typedef __u16 __sum16;
struct prog_test_def;
+struct test_selector {
+ const char *name;
+ bool *num_set;
+ int num_set_len;
+};
+
struct test_env {
- int test_num_selector;
- const char *test_name_selector;
+ struct test_selector test_selector;
+ struct test_selector subtest_selector;
bool verifier_stats;
bool verbose;
bool very_verbose;
@@ -54,8 +60,9 @@ struct test_env {
size_t log_cnt;
size_t log_cap;
- int succ_cnt;
- int fail_cnt;
+ int succ_cnt; /* successful tests */
+ int sub_succ_cnt; /* successful sub-tests */
+ int fail_cnt; /* total failed tests + sub-tests */
};
extern int error_cnt;
@@ -65,6 +72,7 @@ extern struct test_env env;
extern void test__printf(const char *fmt, ...);
extern void test__vprintf(const char *fmt, va_list args);
extern void test__force_log();
+extern bool test__start_subtest(const char *name);
#define MAGIC_BYTES 123
--
2.17.1
^ permalink raw reply related
* [PATCH v3 bpf-next 8/9] selftests/bpf: convert bpf_verif_scale.c to sub-tests API
From: Andrii Nakryiko @ 2019-07-28 3:25 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, sdf
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190728032531.2358749-1-andriin@fb.com>
Expose each BPF verifier scale test as individual sub-test to allow
independent results output and test selection.
Test run results now look like this:
$ sudo ./test_progs -t verif/
#3/1 loop3.o:OK
#3/2 test_verif_scale1.o:OK
#3/3 test_verif_scale2.o:OK
#3/4 test_verif_scale3.o:OK
#3/5 pyperf50.o:OK
#3/6 pyperf100.o:OK
#3/7 pyperf180.o:OK
#3/8 pyperf600.o:OK
#3/9 pyperf600_nounroll.o:OK
#3/10 loop1.o:OK
#3/11 loop2.o:OK
#3/12 strobemeta.o:OK
#3/13 strobemeta_nounroll1.o:OK
#3/14 strobemeta_nounroll2.o:OK
#3/15 test_sysctl_loop1.o:OK
#3/16 test_sysctl_loop2.o:OK
#3/17 test_xdp_loop.o:OK
#3/18 test_seg6_loop.o:OK
#3 bpf_verif_scale:OK
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
.../bpf/prog_tests/bpf_verif_scale.c | 77 ++++++++++---------
1 file changed, 40 insertions(+), 37 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
index b59017279e0b..b4be96162ff4 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
@@ -33,14 +33,25 @@ static int check_load(const char *file, enum bpf_prog_type type)
return err;
}
+struct scale_test_def {
+ const char *file;
+ enum bpf_prog_type attach_type;
+ bool fails;
+};
+
void test_bpf_verif_scale(void)
{
- const char *sched_cls[] = {
- "./test_verif_scale1.o", "./test_verif_scale2.o", "./test_verif_scale3.o",
- };
- const char *raw_tp[] = {
+ struct scale_test_def tests[] = {
+ { "loop3.o", BPF_PROG_TYPE_RAW_TRACEPOINT, true /* fails */ },
+
+ { "test_verif_scale1.o", BPF_PROG_TYPE_SCHED_CLS },
+ { "test_verif_scale2.o", BPF_PROG_TYPE_SCHED_CLS },
+ { "test_verif_scale3.o", BPF_PROG_TYPE_SCHED_CLS },
+
/* full unroll by llvm */
- "./pyperf50.o", "./pyperf100.o", "./pyperf180.o",
+ { "pyperf50.o", BPF_PROG_TYPE_RAW_TRACEPOINT },
+ { "pyperf100.o", BPF_PROG_TYPE_RAW_TRACEPOINT },
+ { "pyperf180.o", BPF_PROG_TYPE_RAW_TRACEPOINT },
/* partial unroll. llvm will unroll loop ~150 times.
* C loop count -> 600.
@@ -48,7 +59,7 @@ void test_bpf_verif_scale(void)
* 16k insns in loop body.
* Total of 5 such loops. Total program size ~82k insns.
*/
- "./pyperf600.o",
+ { "pyperf600.o", BPF_PROG_TYPE_RAW_TRACEPOINT },
/* no unroll at all.
* C loop count -> 600.
@@ -56,22 +67,26 @@ void test_bpf_verif_scale(void)
* ~110 insns in loop body.
* Total of 5 such loops. Total program size ~1500 insns.
*/
- "./pyperf600_nounroll.o",
+ { "pyperf600_nounroll.o", BPF_PROG_TYPE_RAW_TRACEPOINT },
- "./loop1.o", "./loop2.o",
+ { "loop1.o", BPF_PROG_TYPE_RAW_TRACEPOINT },
+ { "loop2.o", BPF_PROG_TYPE_RAW_TRACEPOINT },
/* partial unroll. 19k insn in a loop.
* Total program size 20.8k insn.
* ~350k processed_insns
*/
- "./strobemeta.o",
+ { "strobemeta.o", BPF_PROG_TYPE_RAW_TRACEPOINT },
/* no unroll, tiny loops */
- "./strobemeta_nounroll1.o",
- "./strobemeta_nounroll2.o",
- };
- const char *cg_sysctl[] = {
- "./test_sysctl_loop1.o", "./test_sysctl_loop2.o",
+ { "strobemeta_nounroll1.o", BPF_PROG_TYPE_RAW_TRACEPOINT },
+ { "strobemeta_nounroll2.o", BPF_PROG_TYPE_RAW_TRACEPOINT },
+
+ { "test_sysctl_loop1.o", BPF_PROG_TYPE_CGROUP_SYSCTL },
+ { "test_sysctl_loop2.o", BPF_PROG_TYPE_CGROUP_SYSCTL },
+
+ { "test_xdp_loop.o", BPF_PROG_TYPE_XDP },
+ { "test_seg6_loop.o", BPF_PROG_TYPE_LWT_SEG6LOCAL },
};
libbpf_print_fn_t old_print_fn = NULL;
int err, i;
@@ -81,33 +96,21 @@ void test_bpf_verif_scale(void)
old_print_fn = libbpf_set_print(libbpf_debug_print);
}
- err = check_load("./loop3.o", BPF_PROG_TYPE_RAW_TRACEPOINT);
- test__printf("test_scale:loop3:%s\n",
- err ? (error_cnt--, "OK") : "FAIL");
+ for (i = 0; i < ARRAY_SIZE(tests); i++) {
+ const struct scale_test_def *test = &tests[i];
- for (i = 0; i < ARRAY_SIZE(sched_cls); i++) {
- err = check_load(sched_cls[i], BPF_PROG_TYPE_SCHED_CLS);
- test__printf("test_scale:%s:%s\n", sched_cls[i],
- err ? "FAIL" : "OK");
- }
+ if (!test__start_subtest(test->file))
+ continue;
- for (i = 0; i < ARRAY_SIZE(raw_tp); i++) {
- err = check_load(raw_tp[i], BPF_PROG_TYPE_RAW_TRACEPOINT);
- test__printf("test_scale:%s:%s\n", raw_tp[i],
- err ? "FAIL" : "OK");
+ err = check_load(test->file, test->attach_type);
+ if (test->fails) { /* expected to fail */
+ if (err)
+ error_cnt--;
+ else
+ error_cnt++;
+ }
}
- for (i = 0; i < ARRAY_SIZE(cg_sysctl); i++) {
- err = check_load(cg_sysctl[i], BPF_PROG_TYPE_CGROUP_SYSCTL);
- test__printf("test_scale:%s:%s\n", cg_sysctl[i],
- err ? "FAIL" : "OK");
- }
- err = check_load("./test_xdp_loop.o", BPF_PROG_TYPE_XDP);
- test__printf("test_scale:test_xdp_loop:%s\n", err ? "FAIL" : "OK");
-
- err = check_load("./test_seg6_loop.o", BPF_PROG_TYPE_LWT_SEG6LOCAL);
- test__printf("test_scale:test_seg6_loop:%s\n", err ? "FAIL" : "OK");
-
if (env.verifier_stats)
libbpf_set_print(old_print_fn);
}
--
2.17.1
^ permalink raw reply related
* [PATCH v3 bpf-next 9/9] selftests/bpf: convert send_signal.c to use subtests
From: Andrii Nakryiko @ 2019-07-28 3:25 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, sdf
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190728032531.2358749-1-andriin@fb.com>
Convert send_signal set of tests to be exposed as three sub-tests.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
tools/testing/selftests/bpf/prog_tests/send_signal.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
index d950f4558897..461b423d0584 100644
--- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -219,7 +219,10 @@ void test_send_signal(void)
{
int ret = 0;
- ret |= test_send_signal_tracepoint();
- ret |= test_send_signal_perf();
- ret |= test_send_signal_nmi();
+ if (test__start_subtest("send_signal_tracepoint"))
+ ret |= test_send_signal_tracepoint();
+ if (test__start_subtest("send_signal_perf"))
+ ret |= test_send_signal_perf();
+ if (test__start_subtest("send_signal_nmi"))
+ ret |= test_send_signal_nmi();
}
--
2.17.1
^ permalink raw reply related
* [PATCH v3 bpf-next 6/9] selftests/bpf: abstract away test log output
From: Andrii Nakryiko @ 2019-07-28 3:25 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, sdf
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190728032531.2358749-1-andriin@fb.com>
This patch changes how test output is printed out. By default, if test
had no errors, the only output will be a single line with test number,
name, and verdict at the end, e.g.:
#31 xdp:OK
If test had any errors, all log output captured during test execution
will be output after test completes.
It's possible to force output of log with `-v` (`--verbose`) option, in
which case output won't be buffered and will be output immediately.
To support this, individual tests are required to use helper methods for
logging: `test__printf()` and `test__vprintf()`.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
.../selftests/bpf/prog_tests/bpf_obj_id.c | 6 +-
.../bpf/prog_tests/bpf_verif_scale.c | 31 ++--
.../bpf/prog_tests/get_stack_raw_tp.c | 4 +-
.../selftests/bpf/prog_tests/l4lb_all.c | 2 +-
.../selftests/bpf/prog_tests/map_lock.c | 10 +-
.../selftests/bpf/prog_tests/send_signal.c | 8 +-
.../selftests/bpf/prog_tests/spinlock.c | 2 +-
.../bpf/prog_tests/stacktrace_build_id.c | 4 +-
.../bpf/prog_tests/stacktrace_build_id_nmi.c | 4 +-
.../selftests/bpf/prog_tests/xdp_noinline.c | 3 +-
tools/testing/selftests/bpf/test_progs.c | 145 ++++++++++++++----
tools/testing/selftests/bpf/test_progs.h | 37 ++++-
12 files changed, 183 insertions(+), 73 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c b/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
index cb827383db4d..fb5840a62548 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
@@ -106,8 +106,8 @@ void test_bpf_obj_id(void)
if (CHECK(err ||
prog_infos[i].type != BPF_PROG_TYPE_SOCKET_FILTER ||
info_len != sizeof(struct bpf_prog_info) ||
- (jit_enabled && !prog_infos[i].jited_prog_len) ||
- (jit_enabled &&
+ (env.jit_enabled && !prog_infos[i].jited_prog_len) ||
+ (env.jit_enabled &&
!memcmp(jited_insns, zeros, sizeof(zeros))) ||
!prog_infos[i].xlated_prog_len ||
!memcmp(xlated_insns, zeros, sizeof(zeros)) ||
@@ -121,7 +121,7 @@ void test_bpf_obj_id(void)
err, errno, i,
prog_infos[i].type, BPF_PROG_TYPE_SOCKET_FILTER,
info_len, sizeof(struct bpf_prog_info),
- jit_enabled,
+ env.jit_enabled,
prog_infos[i].jited_prog_len,
prog_infos[i].xlated_prog_len,
!!memcmp(jited_insns, zeros, sizeof(zeros)),
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
index ceddb8cc86f4..b59017279e0b 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
@@ -4,12 +4,15 @@
static int libbpf_debug_print(enum libbpf_print_level level,
const char *format, va_list args)
{
- if (level != LIBBPF_DEBUG)
- return vfprintf(stderr, format, args);
+ if (level != LIBBPF_DEBUG) {
+ test__vprintf(format, args);
+ return 0;
+ }
if (!strstr(format, "verifier log"))
return 0;
- return vfprintf(stderr, "%s", args);
+ test__vprintf("%s", args);
+ return 0;
}
static int check_load(const char *file, enum bpf_prog_type type)
@@ -73,32 +76,38 @@ void test_bpf_verif_scale(void)
libbpf_print_fn_t old_print_fn = NULL;
int err, i;
- if (verifier_stats)
+ if (env.verifier_stats) {
+ test__force_log();
old_print_fn = libbpf_set_print(libbpf_debug_print);
+ }
err = check_load("./loop3.o", BPF_PROG_TYPE_RAW_TRACEPOINT);
- printf("test_scale:loop3:%s\n", err ? (error_cnt--, "OK") : "FAIL");
+ test__printf("test_scale:loop3:%s\n",
+ err ? (error_cnt--, "OK") : "FAIL");
for (i = 0; i < ARRAY_SIZE(sched_cls); i++) {
err = check_load(sched_cls[i], BPF_PROG_TYPE_SCHED_CLS);
- printf("test_scale:%s:%s\n", sched_cls[i], err ? "FAIL" : "OK");
+ test__printf("test_scale:%s:%s\n", sched_cls[i],
+ err ? "FAIL" : "OK");
}
for (i = 0; i < ARRAY_SIZE(raw_tp); i++) {
err = check_load(raw_tp[i], BPF_PROG_TYPE_RAW_TRACEPOINT);
- printf("test_scale:%s:%s\n", raw_tp[i], err ? "FAIL" : "OK");
+ test__printf("test_scale:%s:%s\n", raw_tp[i],
+ err ? "FAIL" : "OK");
}
for (i = 0; i < ARRAY_SIZE(cg_sysctl); i++) {
err = check_load(cg_sysctl[i], BPF_PROG_TYPE_CGROUP_SYSCTL);
- printf("test_scale:%s:%s\n", cg_sysctl[i], err ? "FAIL" : "OK");
+ test__printf("test_scale:%s:%s\n", cg_sysctl[i],
+ err ? "FAIL" : "OK");
}
err = check_load("./test_xdp_loop.o", BPF_PROG_TYPE_XDP);
- printf("test_scale:test_xdp_loop:%s\n", err ? "FAIL" : "OK");
+ test__printf("test_scale:test_xdp_loop:%s\n", err ? "FAIL" : "OK");
err = check_load("./test_seg6_loop.o", BPF_PROG_TYPE_LWT_SEG6LOCAL);
- printf("test_scale:test_seg6_loop:%s\n", err ? "FAIL" : "OK");
+ test__printf("test_scale:test_seg6_loop:%s\n", err ? "FAIL" : "OK");
- if (verifier_stats)
+ if (env.verifier_stats)
libbpf_set_print(old_print_fn);
}
diff --git a/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c b/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c
index 9d73a8f932ac..3d59b3c841fe 100644
--- a/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c
+++ b/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c
@@ -41,7 +41,7 @@ static void get_stack_print_output(void *ctx, int cpu, void *data, __u32 size)
* just assume it is good if the stack is not empty.
* This could be improved in the future.
*/
- if (jit_enabled) {
+ if (env.jit_enabled) {
found = num_stack > 0;
} else {
for (i = 0; i < num_stack; i++) {
@@ -58,7 +58,7 @@ static void get_stack_print_output(void *ctx, int cpu, void *data, __u32 size)
}
} else {
num_stack = e->kern_stack_size / sizeof(__u64);
- if (jit_enabled) {
+ if (env.jit_enabled) {
good_kern_stack = num_stack > 0;
} else {
for (i = 0; i < num_stack; i++) {
diff --git a/tools/testing/selftests/bpf/prog_tests/l4lb_all.c b/tools/testing/selftests/bpf/prog_tests/l4lb_all.c
index 20ddca830e68..5ce572c03a5f 100644
--- a/tools/testing/selftests/bpf/prog_tests/l4lb_all.c
+++ b/tools/testing/selftests/bpf/prog_tests/l4lb_all.c
@@ -74,7 +74,7 @@ static void test_l4lb(const char *file)
}
if (bytes != MAGIC_BYTES * NUM_ITER * 2 || pkts != NUM_ITER * 2) {
error_cnt++;
- printf("test_l4lb:FAIL:stats %lld %lld\n", bytes, pkts);
+ test__printf("test_l4lb:FAIL:stats %lld %lld\n", bytes, pkts);
}
out:
bpf_object__close(obj);
diff --git a/tools/testing/selftests/bpf/prog_tests/map_lock.c b/tools/testing/selftests/bpf/prog_tests/map_lock.c
index ee99368c595c..2e78217ed3fd 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_lock.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_lock.c
@@ -9,12 +9,12 @@ static void *parallel_map_access(void *arg)
for (i = 0; i < 10000; i++) {
err = bpf_map_lookup_elem_flags(map_fd, &key, vars, BPF_F_LOCK);
if (err) {
- printf("lookup failed\n");
+ test__printf("lookup failed\n");
error_cnt++;
goto out;
}
if (vars[0] != 0) {
- printf("lookup #%d var[0]=%d\n", i, vars[0]);
+ test__printf("lookup #%d var[0]=%d\n", i, vars[0]);
error_cnt++;
goto out;
}
@@ -22,8 +22,8 @@ static void *parallel_map_access(void *arg)
for (j = 2; j < 17; j++) {
if (vars[j] == rnd)
continue;
- printf("lookup #%d var[1]=%d var[%d]=%d\n",
- i, rnd, j, vars[j]);
+ test__printf("lookup #%d var[1]=%d var[%d]=%d\n",
+ i, rnd, j, vars[j]);
error_cnt++;
goto out;
}
@@ -43,7 +43,7 @@ void test_map_lock(void)
err = bpf_prog_load(file, BPF_PROG_TYPE_CGROUP_SKB, &obj, &prog_fd);
if (err) {
- printf("test_map_lock:bpf_prog_load errno %d\n", errno);
+ test__printf("test_map_lock:bpf_prog_load errno %d\n", errno);
goto close_prog;
}
map_fd[0] = bpf_find_map(__func__, obj, "hash_map");
diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
index 54218ee3c004..d950f4558897 100644
--- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -202,8 +202,8 @@ static int test_send_signal_nmi(void)
-1 /* cpu */, -1 /* group_fd */, 0 /* flags */);
if (pmu_fd == -1) {
if (errno == ENOENT) {
- printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n",
- __func__);
+ test__printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n",
+ __func__);
return 0;
}
/* Let the test fail with a more informative message */
@@ -222,8 +222,4 @@ void test_send_signal(void)
ret |= test_send_signal_tracepoint();
ret |= test_send_signal_perf();
ret |= test_send_signal_nmi();
- if (!ret)
- printf("test_send_signal:OK\n");
- else
- printf("test_send_signal:FAIL\n");
}
diff --git a/tools/testing/selftests/bpf/prog_tests/spinlock.c b/tools/testing/selftests/bpf/prog_tests/spinlock.c
index 114ebe6a438e..deb2db5b85b0 100644
--- a/tools/testing/selftests/bpf/prog_tests/spinlock.c
+++ b/tools/testing/selftests/bpf/prog_tests/spinlock.c
@@ -12,7 +12,7 @@ void test_spinlock(void)
err = bpf_prog_load(file, BPF_PROG_TYPE_CGROUP_SKB, &obj, &prog_fd);
if (err) {
- printf("test_spin_lock:bpf_prog_load errno %d\n", errno);
+ test__printf("test_spin_lock:bpf_prog_load errno %d\n", errno);
goto close_prog;
}
for (i = 0; i < 4; i++)
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
index ac44fda84833..356d2c017a9c 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
@@ -109,8 +109,8 @@ void test_stacktrace_build_id(void)
if (build_id_matches < 1 && retry--) {
bpf_link__destroy(link);
bpf_object__close(obj);
- printf("%s:WARN:Didn't find expected build ID from the map, retrying\n",
- __func__);
+ test__printf("%s:WARN:Didn't find expected build ID from the map, retrying\n",
+ __func__);
goto retry;
}
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
index 9557b7dfb782..f44f2c159714 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
@@ -140,8 +140,8 @@ void test_stacktrace_build_id_nmi(void)
if (build_id_matches < 1 && retry--) {
bpf_link__destroy(link);
bpf_object__close(obj);
- printf("%s:WARN:Didn't find expected build ID from the map, retrying\n",
- __func__);
+ test__printf("%s:WARN:Didn't find expected build ID from the map, retrying\n",
+ __func__);
goto retry;
}
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c b/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c
index 09e6b46f5515..b5404494b8aa 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c
@@ -75,7 +75,8 @@ void test_xdp_noinline(void)
}
if (bytes != MAGIC_BYTES * NUM_ITER * 2 || pkts != NUM_ITER * 2) {
error_cnt++;
- printf("test_xdp_noinline:FAIL:stats %lld %lld\n", bytes, pkts);
+ test__printf("test_xdp_noinline:FAIL:stats %lld %lld\n",
+ bytes, pkts);
}
out:
bpf_object__close(obj);
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 94b6951b90b3..1b7470d3da22 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -6,9 +6,85 @@
#include <argp.h>
#include <string.h>
+/* defined in test_progs.h */
+struct test_env env = {
+ .test_num_selector = -1,
+};
int error_cnt, pass_cnt;
-bool jit_enabled;
-bool verifier_stats = false;
+
+struct prog_test_def {
+ const char *test_name;
+ int test_num;
+ void (*run_test)(void);
+ bool force_log;
+ int pass_cnt;
+ int error_cnt;
+ bool tested;
+};
+
+void test__force_log() {
+ env.test->force_log = true;
+}
+
+void test__vprintf(const char *fmt, va_list args)
+{
+ size_t rem_sz;
+ int ret = 0;
+
+ if (env.verbose || (env.test && env.test->force_log)) {
+ vfprintf(stderr, fmt, args);
+ return;
+ }
+
+try_again:
+ rem_sz = env.log_cap - env.log_cnt;
+ if (rem_sz) {
+ va_list ap;
+
+ va_copy(ap, args);
+ /* we reserved extra byte for \0 at the end */
+ ret = vsnprintf(env.log_buf + env.log_cnt, rem_sz + 1, fmt, ap);
+ va_end(ap);
+
+ if (ret < 0) {
+ env.log_buf[env.log_cnt] = '\0';
+ fprintf(stderr, "failed to log w/ fmt '%s'\n", fmt);
+ return;
+ }
+ }
+
+ if (!rem_sz || ret > rem_sz) {
+ size_t new_sz = env.log_cap * 3 / 2;
+ char *new_buf;
+
+ if (new_sz < 4096)
+ new_sz = 4096;
+ if (new_sz < ret + env.log_cnt)
+ new_sz = ret + env.log_cnt;
+
+ /* +1 for guaranteed space for terminating \0 */
+ new_buf = realloc(env.log_buf, new_sz + 1);
+ if (!new_buf) {
+ fprintf(stderr, "failed to realloc log buffer: %d\n",
+ errno);
+ return;
+ }
+ env.log_buf = new_buf;
+ env.log_cap = new_sz;
+ goto try_again;
+ }
+
+ env.log_cnt += ret;
+}
+
+void test__printf(const char *fmt, ...)
+{
+ va_list args;
+
+ va_start(args, fmt);
+ test__vprintf(fmt, args);
+ va_end(args);
+}
struct ipv4_packet pkt_v4 = {
.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
@@ -163,20 +239,15 @@ void *spin_lock_thread(void *arg)
#include <prog_tests/tests.h>
#undef DEFINE_TEST
-struct prog_test_def {
- const char *test_name;
- int test_num;
- void (*run_test)(void);
-};
-
static struct prog_test_def prog_test_defs[] = {
-#define DEFINE_TEST(name) { \
- .test_name = #name, \
- .run_test = &test_##name, \
+#define DEFINE_TEST(name) { \
+ .test_name = #name, \
+ .run_test = &test_##name, \
},
#include <prog_tests/tests.h>
#undef DEFINE_TEST
};
+const int prog_test_cnt = ARRAY_SIZE(prog_test_defs);
const char *argp_program_version = "test_progs 0.1";
const char *argp_program_bug_address = "<bpf@vger.kernel.org>";
@@ -186,7 +257,6 @@ enum ARG_KEYS {
ARG_TEST_NUM = 'n',
ARG_TEST_NAME = 't',
ARG_VERIFIER_STATS = 's',
-
ARG_VERBOSE = 'v',
};
@@ -202,24 +272,13 @@ static const struct argp_option opts[] = {
{},
};
-struct test_env {
- int test_num_selector;
- const char *test_name_selector;
- bool verifier_stats;
- bool verbose;
- bool very_verbose;
-};
-
-static struct test_env env = {
- .test_num_selector = -1,
-};
-
static int libbpf_print_fn(enum libbpf_print_level level,
const char *format, va_list args)
{
if (!env.very_verbose && level == LIBBPF_DEBUG)
return 0;
- return vfprintf(stderr, format, args);
+ test__vprintf(format, args);
+ return 0;
}
static error_t parse_arg(int key, char *arg, struct argp_state *state)
@@ -267,7 +326,6 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state)
return 0;
}
-
int main(int argc, char **argv)
{
static const struct argp argp = {
@@ -275,7 +333,6 @@ int main(int argc, char **argv)
.parser = parse_arg,
.doc = argp_program_doc,
};
- struct prog_test_def *test;
int err, i;
err = argp_parse(&argp, argc, argv, 0, NULL, &env);
@@ -286,13 +343,14 @@ int main(int argc, char **argv)
srand(time(NULL));
- jit_enabled = is_jit_enabled();
+ env.jit_enabled = is_jit_enabled();
- verifier_stats = env.verifier_stats;
-
- for (i = 0; i < ARRAY_SIZE(prog_test_defs); i++) {
- test = &prog_test_defs[i];
+ for (i = 0; i < prog_test_cnt; i++) {
+ struct prog_test_def *test = &prog_test_defs[i];
+ int old_pass_cnt = pass_cnt;
+ int old_error_cnt = error_cnt;
+ env.test = test;
test->test_num = i + 1;
if (env.test_num_selector >= 0 &&
@@ -303,8 +361,29 @@ int main(int argc, char **argv)
continue;
test->run_test();
+ test->tested = true;
+ test->pass_cnt = pass_cnt - old_pass_cnt;
+ test->error_cnt = error_cnt - old_error_cnt;
+ if (test->error_cnt)
+ env.fail_cnt++;
+ else
+ env.succ_cnt++;
+
+ if (env.verbose || test->force_log || test->error_cnt) {
+ if (env.log_cnt) {
+ fprintf(stdout, "%s", env.log_buf);
+ if (env.log_buf[env.log_cnt - 1] != '\n')
+ fprintf(stdout, "\n");
+ }
+ }
+ env.log_cnt = 0;
+
+ printf("#%d %s:%s\n", test->test_num, test->test_name,
+ test->error_cnt ? "FAIL" : "OK");
}
+ printf("Summary: %d PASSED, %d FAILED\n", env.succ_cnt, env.fail_cnt);
+
+ free(env.log_buf);
- printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
}
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 49e0f7d85643..62f55a4231e9 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -38,9 +38,33 @@ typedef __u16 __sum16;
#include "trace_helpers.h"
#include "flow_dissector_load.h"
-extern int error_cnt, pass_cnt;
-extern bool jit_enabled;
-extern bool verifier_stats;
+struct prog_test_def;
+
+struct test_env {
+ int test_num_selector;
+ const char *test_name_selector;
+ bool verifier_stats;
+ bool verbose;
+ bool very_verbose;
+
+ bool jit_enabled;
+
+ struct prog_test_def *test;
+ char *log_buf;
+ size_t log_cnt;
+ size_t log_cap;
+
+ int succ_cnt;
+ int fail_cnt;
+};
+
+extern int error_cnt;
+extern int pass_cnt;
+extern struct test_env env;
+
+extern void test__printf(const char *fmt, ...);
+extern void test__vprintf(const char *fmt, va_list args);
+extern void test__force_log();
#define MAGIC_BYTES 123
@@ -64,11 +88,12 @@ extern struct ipv6_packet pkt_v6;
int __ret = !!(condition); \
if (__ret) { \
error_cnt++; \
- printf("%s:FAIL:%s ", __func__, tag); \
- printf(format); \
+ test__printf("%s:FAIL:%s ", __func__, tag); \
+ test__printf(format); \
} else { \
pass_cnt++; \
- printf("%s:PASS:%s %d nsec\n", __func__, tag, duration);\
+ test__printf("%s:PASS:%s %d nsec\n", \
+ __func__, tag, duration); \
} \
__ret; \
})
--
2.17.1
^ permalink raw reply related
* Re: general protection fault in tls_trim_both_msgs
From: syzbot @ 2019-07-28 3:46 UTC (permalink / raw)
To: ast, aviadye, borisp, bpf, corbet, daniel, davejwatson, davem,
jakub.kicinski, john.fastabend, kafai, linux-doc, linux-kernel,
netdev, songliubraving, syzkaller-bugs, yhs
In-Reply-To: <0000000000002b4896058e7abf78@google.com>
syzbot has bisected this bug to:
commit 32857cf57f920cdc03b5095f08febec94cf9c36b
Author: John Fastabend <john.fastabend@gmail.com>
Date: Fri Jul 19 17:29:18 2019 +0000
net/tls: fix transition through disconnect with close
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=155064d8600000
start commit: fde50b96 Add linux-next specific files for 20190726
git tree: linux-next
final crash: https://syzkaller.appspot.com/x/report.txt?x=175064d8600000
console output: https://syzkaller.appspot.com/x/log.txt?x=135064d8600000
kernel config: https://syzkaller.appspot.com/x/.config?x=4b58274564b354c1
dashboard link: https://syzkaller.appspot.com/bug?extid=0e0fedcad708d12d3032
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14779d64600000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1587c842600000
Reported-by: syzbot+0e0fedcad708d12d3032@syzkaller.appspotmail.com
Fixes: 32857cf57f92 ("net/tls: fix transition through disconnect with
close")
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
^ permalink raw reply
* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: kbuild test robot @ 2019-07-28 4:06 UTC (permalink / raw)
To: Himadri Pandya
Cc: kbuild-all, mikelley, kys, haiyangz, sthemmin, sashal, davem,
linux-hyperv, netdev, linux-kernel, Himadri Pandya
In-Reply-To: <20190725051125.10605-1-himadri18.07@gmail.com>
Hi Himadri,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on linus/master]
[cannot apply to v5.3-rc1 next-20190726]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Himadri-Pandya/hv_sock-use-HV_HYP_PAGE_SIZE-instead-of-PAGE_SIZE_4K/20190726-085229
reproduce:
# apt-get install sparse
# sparse version: v0.6.1-rc1-7-g2b96cd8-dirty
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
include/linux/sched.h:609:43: sparse: sparse: bad integer constant expression
include/linux/sched.h:609:73: sparse: sparse: invalid named zero-width bitfield `value'
include/linux/sched.h:610:43: sparse: sparse: bad integer constant expression
include/linux/sched.h:610:67: sparse: sparse: invalid named zero-width bitfield `bucket_id'
net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: incompatible types for operation (-)
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: left side has type bad type
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: right side has type int
net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: incompatible types for operation (-)
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: left side has type bad type
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: right side has type int
net/vmw_vsock/hyperv_transport.c:65:17: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:65:17: sparse: sparse: bad constant expression type
net/vmw_vsock/hyperv_transport.c:387:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:388:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:465:25: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:466:25: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:666:9: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
vim +214 net/vmw_vsock/hyperv_transport.c
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 59
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 60 struct hvs_send_buf {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 61 /* The header before the payload data */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 62 struct vmpipe_proto_header hdr;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 63
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 64 /* The payload */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 @65 u8 data[HVS_SEND_BUF_SIZE];
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 66 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 67
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 68 #define HVS_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 69 sizeof(struct vmpipe_proto_header))
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 70
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 71 /* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write(), and
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 72 * __hv_pkt_iter_next().
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 73 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 74 #define VMBUS_PKT_TRAILER_SIZE (sizeof(u64))
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 75
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 76 #define HVS_PKT_LEN(payload_len) (HVS_HEADER_LEN + \
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 77 ALIGN((payload_len), 8) + \
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 78 VMBUS_PKT_TRAILER_SIZE)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 79
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 80 union hvs_service_id {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 81 uuid_le srv_id;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 82
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 83 struct {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 84 unsigned int svm_port;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 85 unsigned char b[sizeof(uuid_le) - sizeof(unsigned int)];
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 86 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 87 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 88
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 89 /* Per-socket state (accessed via vsk->trans) */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 90 struct hvsock {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 91 struct vsock_sock *vsk;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 92
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 93 uuid_le vm_srv_id;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 94 uuid_le host_srv_id;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 95
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 96 struct vmbus_channel *chan;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 97 struct vmpacket_descriptor *recv_desc;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 98
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 99 /* The length of the payload not delivered to userland yet */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 100 u32 recv_data_len;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 101 /* The offset of the payload */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 102 u32 recv_data_off;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 103
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 104 /* Have we sent the zero-length packet (FIN)? */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 105 bool fin_sent;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 106 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 107
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 108 /* In the VM, we support Hyper-V Sockets with AF_VSOCK, and the endpoint is
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 109 * <cid, port> (see struct sockaddr_vm). Note: cid is not really used here:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 110 * when we write apps to connect to the host, we can only use VMADDR_CID_ANY
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 111 * or VMADDR_CID_HOST (both are equivalent) as the remote cid, and when we
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 112 * write apps to bind() & listen() in the VM, we can only use VMADDR_CID_ANY
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 113 * as the local cid.
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 114 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 115 * On the host, Hyper-V Sockets are supported by Winsock AF_HYPERV:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 116 * https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 117 * guide/make-integration-service, and the endpoint is <VmID, ServiceId> with
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 118 * the below sockaddr:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 119 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 120 * struct SOCKADDR_HV
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 121 * {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 122 * ADDRESS_FAMILY Family;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 123 * USHORT Reserved;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 124 * GUID VmId;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 125 * GUID ServiceId;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 126 * };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 127 * Note: VmID is not used by Linux VM and actually it isn't transmitted via
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 128 * VMBus, because here it's obvious the host and the VM can easily identify
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 129 * each other. Though the VmID is useful on the host, especially in the case
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 130 * of Windows container, Linux VM doesn't need it at all.
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 131 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 132 * To make use of the AF_VSOCK infrastructure in Linux VM, we have to limit
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 133 * the available GUID space of SOCKADDR_HV so that we can create a mapping
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 134 * between AF_VSOCK port and SOCKADDR_HV Service GUID. The rule of writing
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 135 * Hyper-V Sockets apps on the host and in Linux VM is:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 136 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 137 ****************************************************************************
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 138 * The only valid Service GUIDs, from the perspectives of both the host and *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 139 * Linux VM, that can be connected by the other end, must conform to this *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 140 * format: <port>-facb-11e6-bd58-64006a7986d3, and the "port" must be in *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 141 * this range [0, 0x7FFFFFFF]. *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 142 ****************************************************************************
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 143 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 144 * When we write apps on the host to connect(), the GUID ServiceID is used.
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 145 * When we write apps in Linux VM to connect(), we only need to specify the
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 146 * port and the driver will form the GUID and use that to request the host.
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 147 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 148 * From the perspective of Linux VM:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 149 * 1. the local ephemeral port (i.e. the local auto-bound port when we call
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 150 * connect() without explicit bind()) is generated by __vsock_bind_stream(),
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 151 * and the range is [1024, 0xFFFFFFFF).
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 152 * 2. the remote ephemeral port (i.e. the auto-generated remote port for
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 153 * a connect request initiated by the host's connect()) is generated by
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 154 * hvs_remote_addr_init() and the range is [0x80000000, 0xFFFFFFFF).
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 155 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 156
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 157 #define MAX_LISTEN_PORT ((u32)0x7FFFFFFF)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 158 #define MAX_VM_LISTEN_PORT MAX_LISTEN_PORT
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 159 #define MAX_HOST_LISTEN_PORT MAX_LISTEN_PORT
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 160 #define MIN_HOST_EPHEMERAL_PORT (MAX_HOST_LISTEN_PORT + 1)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 161
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 162 /* 00000000-facb-11e6-bd58-64006a7986d3 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 163 static const uuid_le srv_id_template =
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 164 UUID_LE(0x00000000, 0xfacb, 0x11e6, 0xbd, 0x58,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 165 0x64, 0x00, 0x6a, 0x79, 0x86, 0xd3);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 166
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 167 static bool is_valid_srv_id(const uuid_le *id)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 168 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 169 return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(uuid_le) - 4);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 170 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 171
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 172 static unsigned int get_port_by_srv_id(const uuid_le *svr_id)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 173 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 174 return *((unsigned int *)svr_id);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 175 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 176
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 177 static void hvs_addr_init(struct sockaddr_vm *addr, const uuid_le *svr_id)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 178 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 179 unsigned int port = get_port_by_srv_id(svr_id);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 180
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 181 vsock_addr_init(addr, VMADDR_CID_ANY, port);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 182 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 183
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 184 static void hvs_remote_addr_init(struct sockaddr_vm *remote,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 185 struct sockaddr_vm *local)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 186 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 187 static u32 host_ephemeral_port = MIN_HOST_EPHEMERAL_PORT;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 188 struct sock *sk;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 189
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 190 vsock_addr_init(remote, VMADDR_CID_ANY, VMADDR_PORT_ANY);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 191
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 192 while (1) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 193 /* Wrap around ? */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 194 if (host_ephemeral_port < MIN_HOST_EPHEMERAL_PORT ||
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 195 host_ephemeral_port == VMADDR_PORT_ANY)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 196 host_ephemeral_port = MIN_HOST_EPHEMERAL_PORT;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 197
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 198 remote->svm_port = host_ephemeral_port++;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 199
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 200 sk = vsock_find_connected_socket(remote, local);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 201 if (!sk) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 202 /* Found an available ephemeral port */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 203 return;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 204 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 205
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 206 /* Release refcnt got in vsock_find_connected_socket */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 207 sock_put(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 208 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 209 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 210
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 211 static void hvs_set_channel_pending_send_size(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 212 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 213 set_channel_pending_send_size(chan,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 @214 HVS_PKT_LEN(HVS_SEND_BUF_SIZE));
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 215
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 216 virt_mb();
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 217 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 218
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 219 static bool hvs_channel_readable(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 220 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 221 u32 readable = hv_get_bytes_to_read(&chan->inbound);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 222
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 223 /* 0-size payload means FIN */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 224 return readable >= HVS_PKT_LEN(0);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 225 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 226
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 227 static int hvs_channel_readable_payload(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 228 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 229 u32 readable = hv_get_bytes_to_read(&chan->inbound);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 230
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 231 if (readable > HVS_PKT_LEN(0)) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 232 /* At least we have 1 byte to read. We don't need to return
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 233 * the exact readable bytes: see vsock_stream_recvmsg() ->
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 234 * vsock_stream_has_data().
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 235 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 236 return 1;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 237 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 238
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 239 if (readable == HVS_PKT_LEN(0)) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 240 /* 0-size payload means FIN */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 241 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 242 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 243
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 244 /* No payload or FIN */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 245 return -1;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 246 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 247
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 248 static size_t hvs_channel_writable_bytes(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 249 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 250 u32 writeable = hv_get_bytes_to_write(&chan->outbound);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 251 size_t ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 252
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 253 /* The ringbuffer mustn't be 100% full, and we should reserve a
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 254 * zero-length-payload packet for the FIN: see hv_ringbuffer_write()
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 255 * and hvs_shutdown().
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 256 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 257 if (writeable <= HVS_PKT_LEN(1) + HVS_PKT_LEN(0))
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 258 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 259
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 260 ret = writeable - HVS_PKT_LEN(1) - HVS_PKT_LEN(0);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 261
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 262 return round_down(ret, 8);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 263 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 264
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 265 static int hvs_send_data(struct vmbus_channel *chan,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 266 struct hvs_send_buf *send_buf, size_t to_write)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 267 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 268 send_buf->hdr.pkt_type = 1;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 269 send_buf->hdr.data_size = to_write;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 270 return vmbus_sendpacket(chan, &send_buf->hdr,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 271 sizeof(send_buf->hdr) + to_write,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 272 0, VM_PKT_DATA_INBAND, 0);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 273 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 274
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 275 static void hvs_channel_cb(void *ctx)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 276 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 277 struct sock *sk = (struct sock *)ctx;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 278 struct vsock_sock *vsk = vsock_sk(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 279 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 280 struct vmbus_channel *chan = hvs->chan;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 281
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 282 if (hvs_channel_readable(chan))
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 283 sk->sk_data_ready(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 284
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 285 if (hv_get_bytes_to_write(&chan->outbound) > 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 286 sk->sk_write_space(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 287 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 288
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 289 static void hvs_do_close_lock_held(struct vsock_sock *vsk,
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 290 bool cancel_timeout)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 291 {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 292 struct sock *sk = sk_vsock(vsk);
b4562ca7925a3be Dexuan Cui 2017-10-19 293
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 294 sock_set_flag(sk, SOCK_DONE);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 295 vsk->peer_shutdown = SHUTDOWN_MASK;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 296 if (vsock_stream_has_data(vsk) <= 0)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 297 sk->sk_state = TCP_CLOSING;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 298 sk->sk_state_change(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 299 if (vsk->close_work_scheduled &&
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 300 (!cancel_timeout || cancel_delayed_work(&vsk->close_work))) {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 301 vsk->close_work_scheduled = false;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 302 vsock_remove_sock(vsk);
b4562ca7925a3be Dexuan Cui 2017-10-19 303
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 304 /* Release the reference taken while scheduling the timeout */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 305 sock_put(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 306 }
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 307 }
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 308
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 309 static void hvs_close_connection(struct vmbus_channel *chan)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 310 {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 311 struct sock *sk = get_per_channel_state(chan);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 312
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 313 lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 314 hvs_do_close_lock_held(vsock_sk(sk), true);
b4562ca7925a3be Dexuan Cui 2017-10-19 315 release_sock(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 316 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 317
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 318 static void hvs_open_connection(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 319 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 320 uuid_le *if_instance, *if_type;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 321 unsigned char conn_from_host;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 322
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 323 struct sockaddr_vm addr;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 324 struct sock *sk, *new = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 325 struct vsock_sock *vnew = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 326 struct hvsock *hvs = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 327 struct hvsock *hvs_new = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 328 int rcvbuf;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 329 int ret;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 330 int sndbuf;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 331
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 332 if_type = &chan->offermsg.offer.if_type;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 333 if_instance = &chan->offermsg.offer.if_instance;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 334 conn_from_host = chan->offermsg.offer.u.pipe.user_def[0];
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 335
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 336 /* The host or the VM should only listen on a port in
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 337 * [0, MAX_LISTEN_PORT]
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 338 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 339 if (!is_valid_srv_id(if_type) ||
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 340 get_port_by_srv_id(if_type) > MAX_LISTEN_PORT)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 341 return;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 342
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 343 hvs_addr_init(&addr, conn_from_host ? if_type : if_instance);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 344 sk = vsock_find_bound_socket(&addr);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 345 if (!sk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 346 return;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 347
b4562ca7925a3be Dexuan Cui 2017-10-19 348 lock_sock(sk);
3b4477d2dcf2709 Stefan Hajnoczi 2017-10-05 349 if ((conn_from_host && sk->sk_state != TCP_LISTEN) ||
3b4477d2dcf2709 Stefan Hajnoczi 2017-10-05 350 (!conn_from_host && sk->sk_state != TCP_SYN_SENT))
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 351 goto out;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 352
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 353 if (conn_from_host) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 354 if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 355 goto out;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 356
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 357 new = __vsock_create(sock_net(sk), NULL, sk, GFP_KERNEL,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 358 sk->sk_type, 0);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 359 if (!new)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 360 goto out;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 361
3b4477d2dcf2709 Stefan Hajnoczi 2017-10-05 362 new->sk_state = TCP_SYN_SENT;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 363 vnew = vsock_sk(new);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 364 hvs_new = vnew->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 365 hvs_new->chan = chan;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 366 } else {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 367 hvs = vsock_sk(sk)->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 368 hvs->chan = chan;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 369 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 370
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 371 set_channel_read_mode(chan, HV_CALL_DIRECT);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 372
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 373 /* Use the socket buffer sizes as hints for the VMBUS ring size. For
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 374 * server side sockets, 'sk' is the parent socket and thus, this will
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 375 * allow the child sockets to inherit the size from the parent. Keep
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 376 * the mins to the default value and align to page size as per VMBUS
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 377 * requirements.
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 378 * For the max, the socket core library will limit the socket buffer
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 379 * size that can be set by the user, but, since currently, the hv_sock
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 380 * VMBUS ring buffer is physically contiguous allocation, restrict it
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 381 * further.
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 382 * Older versions of hv_sock host side code cannot handle bigger VMBUS
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 383 * ring buffer size. Use the version number to limit the change to newer
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 384 * versions.
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 385 */
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 386 if (vmbus_proto_version < VERSION_WIN10_V5) {
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 387 sndbuf = RINGBUFFER_HVS_SND_SIZE;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 388 rcvbuf = RINGBUFFER_HVS_RCV_SIZE;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 389 } else {
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 @390 sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 391 sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
31113cc83e30924 Himadri Pandya 2019-07-25 392 sndbuf = ALIGN(sndbuf, HV_HYP_PAGE_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 393 rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 394 rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
31113cc83e30924 Himadri Pandya 2019-07-25 395 rcvbuf = ALIGN(rcvbuf, HV_HYP_PAGE_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 396 }
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 397
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 398 ret = vmbus_open(chan, sndbuf, rcvbuf, NULL, 0, hvs_channel_cb,
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 399 conn_from_host ? new : sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 400 if (ret != 0) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 401 if (conn_from_host) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 402 hvs_new->chan = NULL;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 403 sock_put(new);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 404 } else {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 405 hvs->chan = NULL;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 406 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 407 goto out;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 408 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 409
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 410 set_per_channel_state(chan, conn_from_host ? new : sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 411 vmbus_set_chn_rescind_callback(chan, hvs_close_connection);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 412
cb359b60416701c Sunil Muthuswamy 2019-06-17 413 /* Set the pending send size to max packet size to always get
cb359b60416701c Sunil Muthuswamy 2019-06-17 414 * notifications from the host when there is enough writable space.
cb359b60416701c Sunil Muthuswamy 2019-06-17 415 * The host is optimized to send notifications only when the pending
cb359b60416701c Sunil Muthuswamy 2019-06-17 416 * size boundary is crossed, and not always.
cb359b60416701c Sunil Muthuswamy 2019-06-17 417 */
cb359b60416701c Sunil Muthuswamy 2019-06-17 418 hvs_set_channel_pending_send_size(chan);
cb359b60416701c Sunil Muthuswamy 2019-06-17 419
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 420 if (conn_from_host) {
3b4477d2dcf2709 Stefan Hajnoczi 2017-10-05 421 new->sk_state = TCP_ESTABLISHED;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 422 sk->sk_ack_backlog++;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 423
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 424 hvs_addr_init(&vnew->local_addr, if_type);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 425 hvs_remote_addr_init(&vnew->remote_addr, &vnew->local_addr);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 426
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 427 hvs_new->vm_srv_id = *if_type;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 428 hvs_new->host_srv_id = *if_instance;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 429
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 430 vsock_insert_connected(vnew);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 431
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 432 vsock_enqueue_accept(sk, new);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 433 } else {
3b4477d2dcf2709 Stefan Hajnoczi 2017-10-05 434 sk->sk_state = TCP_ESTABLISHED;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 435 sk->sk_socket->state = SS_CONNECTED;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 436
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 437 vsock_insert_connected(vsock_sk(sk));
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 438 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 439
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 440 sk->sk_state_change(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 441
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 442 out:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 443 /* Release refcnt obtained when we called vsock_find_bound_socket() */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 444 sock_put(sk);
b4562ca7925a3be Dexuan Cui 2017-10-19 445
b4562ca7925a3be Dexuan Cui 2017-10-19 446 release_sock(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 447 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 448
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 449 static u32 hvs_get_local_cid(void)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 450 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 451 return VMADDR_CID_ANY;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 452 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 453
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 454 static int hvs_sock_init(struct vsock_sock *vsk, struct vsock_sock *psk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 455 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 456 struct hvsock *hvs;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 457 struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 458
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 459 hvs = kzalloc(sizeof(*hvs), GFP_KERNEL);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 460 if (!hvs)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 461 return -ENOMEM;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 462
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 463 vsk->trans = hvs;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 464 hvs->vsk = vsk;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 465 sk->sk_sndbuf = RINGBUFFER_HVS_SND_SIZE;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 466 sk->sk_rcvbuf = RINGBUFFER_HVS_RCV_SIZE;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 467 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 468 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 469
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 470 static int hvs_connect(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 471 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 472 union hvs_service_id vm, host;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 473 struct hvsock *h = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 474
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 475 vm.srv_id = srv_id_template;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 476 vm.svm_port = vsk->local_addr.svm_port;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 477 h->vm_srv_id = vm.srv_id;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 478
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 479 host.srv_id = srv_id_template;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 480 host.svm_port = vsk->remote_addr.svm_port;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 481 h->host_srv_id = host.srv_id;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 482
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 483 return vmbus_send_tl_connect_request(&h->vm_srv_id, &h->host_srv_id);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 484 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 485
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 486 static void hvs_shutdown_lock_held(struct hvsock *hvs, int mode)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 487 {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 488 struct vmpipe_proto_header hdr;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 489
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 490 if (hvs->fin_sent || !hvs->chan)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 491 return;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 492
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 493 /* It can't fail: see hvs_channel_writable_bytes(). */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 494 (void)hvs_send_data(hvs->chan, (struct hvs_send_buf *)&hdr, 0);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 495 hvs->fin_sent = true;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 496 }
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 497
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 498 static int hvs_shutdown(struct vsock_sock *vsk, int mode)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 499 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 500 struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 501
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 502 if (!(mode & SEND_SHUTDOWN))
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 503 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 504
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 505 lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 506 hvs_shutdown_lock_held(vsk->trans, mode);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 507 release_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 508 return 0;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 509 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 510
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 511 static void hvs_close_timeout(struct work_struct *work)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 512 {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 513 struct vsock_sock *vsk =
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 514 container_of(work, struct vsock_sock, close_work.work);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 515 struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 516
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 517 sock_hold(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 518 lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 519 if (!sock_flag(sk, SOCK_DONE))
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 520 hvs_do_close_lock_held(vsk, false);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 521
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 522 vsk->close_work_scheduled = false;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 523 release_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 524 sock_put(sk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 525 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 526
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 527 /* Returns true, if it is safe to remove socket; false otherwise */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 528 static bool hvs_close_lock_held(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 529 {
b4562ca7925a3be Dexuan Cui 2017-10-19 530 struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 531
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 532 if (!(sk->sk_state == TCP_ESTABLISHED ||
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 533 sk->sk_state == TCP_CLOSING))
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 534 return true;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 535
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 536 if ((sk->sk_shutdown & SHUTDOWN_MASK) != SHUTDOWN_MASK)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 537 hvs_shutdown_lock_held(vsk->trans, SHUTDOWN_MASK);
b4562ca7925a3be Dexuan Cui 2017-10-19 538
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 539 if (sock_flag(sk, SOCK_DONE))
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 540 return true;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 541
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 542 /* This reference will be dropped by the delayed close routine */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 543 sock_hold(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 544 INIT_DELAYED_WORK(&vsk->close_work, hvs_close_timeout);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 545 vsk->close_work_scheduled = true;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 546 schedule_delayed_work(&vsk->close_work, HVS_CLOSE_TIMEOUT);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 547 return false;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 548 }
b4562ca7925a3be Dexuan Cui 2017-10-19 549
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 550 static void hvs_release(struct vsock_sock *vsk)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 551 {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 552 struct sock *sk = sk_vsock(vsk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 553 bool remove_sock;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 554
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 555 lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 556 remove_sock = hvs_close_lock_held(vsk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 557 release_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 558 if (remove_sock)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15 559 vsock_remove_sock(vsk);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 560 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 561
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 562 static void hvs_destruct(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 563 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 564 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 565 struct vmbus_channel *chan = hvs->chan;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 566
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 567 if (chan)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 568 vmbus_hvsock_device_unregister(chan);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 569
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 570 kfree(hvs);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 571 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 572
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 573 static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 574 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 575 return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 576 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 577
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 578 static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 579 size_t len, int flags)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 580 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 581 return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 582 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 583
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 584 static int hvs_dgram_enqueue(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 585 struct sockaddr_vm *remote, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 586 size_t dgram_len)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 587 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 588 return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 589 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 590
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 591 static bool hvs_dgram_allow(u32 cid, u32 port)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 592 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 593 return false;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 594 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 595
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 596 static int hvs_update_recv_data(struct hvsock *hvs)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 597 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 598 struct hvs_recv_buf *recv_buf;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 599 u32 payload_len;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 600
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 601 recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 602 payload_len = recv_buf->hdr.data_size;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 603
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 604 if (payload_len > HVS_MTU_SIZE)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 605 return -EIO;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 606
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 607 if (payload_len == 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 608 hvs->vsk->peer_shutdown |= SEND_SHUTDOWN;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 609
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 610 hvs->recv_data_len = payload_len;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 611 hvs->recv_data_off = 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 612
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 613 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 614 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 615
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 616 static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 617 size_t len, int flags)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 618 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 619 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 620 bool need_refill = !hvs->recv_desc;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 621 struct hvs_recv_buf *recv_buf;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 622 u32 to_read;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 623 int ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 624
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 625 if (flags & MSG_PEEK)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 626 return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 627
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 628 if (need_refill) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 629 hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 630 ret = hvs_update_recv_data(hvs);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 631 if (ret)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 632 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 633 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 634
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 635 recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 636 to_read = min_t(u32, len, hvs->recv_data_len);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 637 ret = memcpy_to_msg(msg, recv_buf->data + hvs->recv_data_off, to_read);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 638 if (ret != 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 639 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 640
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 641 hvs->recv_data_len -= to_read;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 642 if (hvs->recv_data_len == 0) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 643 hvs->recv_desc = hv_pkt_iter_next(hvs->chan, hvs->recv_desc);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 644 if (hvs->recv_desc) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 645 ret = hvs_update_recv_data(hvs);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 646 if (ret)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 647 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 648 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 649 } else {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 650 hvs->recv_data_off += to_read;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 651 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 652
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 653 return to_read;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 654 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 655
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 656 static ssize_t hvs_stream_enqueue(struct vsock_sock *vsk, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 657 size_t len)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 658 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 659 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 660 struct vmbus_channel *chan = hvs->chan;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 661 struct hvs_send_buf *send_buf;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 662 ssize_t to_write, max_writable;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 663 ssize_t ret = 0;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 664 ssize_t bytes_written = 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 665
31113cc83e30924 Himadri Pandya 2019-07-25 666 BUILD_BUG_ON(sizeof(*send_buf) != HV_HYP_PAGE_SIZE);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 667
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 668 send_buf = kmalloc(sizeof(*send_buf), GFP_KERNEL);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 669 if (!send_buf)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 670 return -ENOMEM;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 671
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 672 /* Reader(s) could be draining data from the channel as we write.
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 673 * Maximize bandwidth, by iterating until the channel is found to be
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 674 * full.
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 675 */
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 676 while (len) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 677 max_writable = hvs_channel_writable_bytes(chan);
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 678 if (!max_writable)
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 679 break;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 680 to_write = min_t(ssize_t, len, max_writable);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 681 to_write = min_t(ssize_t, to_write, HVS_SEND_BUF_SIZE);
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 682 /* memcpy_from_msg is safe for loop as it advances the offsets
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 683 * within the message iterator.
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 684 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 685 ret = memcpy_from_msg(send_buf->data, msg, to_write);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 686 if (ret < 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 687 goto out;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 688
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 689 ret = hvs_send_data(hvs->chan, send_buf, to_write);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 690 if (ret < 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 691 goto out;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 692
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 693 bytes_written += to_write;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 694 len -= to_write;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 695 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 696 out:
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 697 /* If any data has been sent, return that */
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 698 if (bytes_written)
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22 699 ret = bytes_written;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 700 kfree(send_buf);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 701 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 702 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 703
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 704 static s64 hvs_stream_has_data(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 705 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 706 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 707 s64 ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 708
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 709 if (hvs->recv_data_len > 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 710 return 1;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 711
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 712 switch (hvs_channel_readable_payload(hvs->chan)) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 713 case 1:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 714 ret = 1;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 715 break;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 716 case 0:
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 717 vsk->peer_shutdown |= SEND_SHUTDOWN;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 718 ret = 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 719 break;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 720 default: /* -1 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 721 ret = 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 722 break;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 723 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 724
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 725 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 726 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 727
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 728 static s64 hvs_stream_has_space(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 729 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 730 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 731
cb359b60416701c Sunil Muthuswamy 2019-06-17 732 return hvs_channel_writable_bytes(hvs->chan);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 733 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 734
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 735 static u64 hvs_stream_rcvhiwat(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 736 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 737 return HVS_MTU_SIZE + 1;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 738 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 739
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 740 static bool hvs_stream_is_active(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 741 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 742 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 743
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 744 return hvs->chan != NULL;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 745 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 746
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 747 static bool hvs_stream_allow(u32 cid, u32 port)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 748 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 749 /* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0xFFFFFFFF) is
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 750 * reserved as ephemeral ports, which are used as the host's ports
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 751 * when the host initiates connections.
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 752 *
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 753 * Perform this check in the guest so an immediate error is produced
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 754 * instead of a timeout.
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 755 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 756 if (port > MAX_HOST_LISTEN_PORT)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 757 return false;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 758
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 759 if (cid == VMADDR_CID_HOST)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 760 return true;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 761
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 762 return false;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 763 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 764
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 765 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 766 int hvs_notify_poll_in(struct vsock_sock *vsk, size_t target, bool *readable)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 767 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 768 struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 769
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 770 *readable = hvs_channel_readable(hvs->chan);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 771 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 772 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 773
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 774 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 775 int hvs_notify_poll_out(struct vsock_sock *vsk, size_t target, bool *writable)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 776 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 777 *writable = hvs_stream_has_space(vsk) > 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 778
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 779 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 780 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 781
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 782 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 783 int hvs_notify_recv_init(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 784 struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 785 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 786 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 787 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 788
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 789 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 790 int hvs_notify_recv_pre_block(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 791 struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 792 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 793 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 794 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 795
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 796 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 797 int hvs_notify_recv_pre_dequeue(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 798 struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 799 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 800 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 801 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 802
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 803 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 804 int hvs_notify_recv_post_dequeue(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 805 ssize_t copied, bool data_read,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 806 struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 807 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 808 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 809 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 810
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 811 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 812 int hvs_notify_send_init(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 813 struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 814 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 815 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 816 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 817
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 818 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 819 int hvs_notify_send_pre_block(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 820 struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 821 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 822 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 823 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 824
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 825 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 826 int hvs_notify_send_pre_enqueue(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 827 struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 828 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 829 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 830 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 831
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 832 static
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 833 int hvs_notify_send_post_enqueue(struct vsock_sock *vsk, ssize_t written,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 834 struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 835 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 836 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 837 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 838
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 839 static void hvs_set_buffer_size(struct vsock_sock *vsk, u64 val)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 840 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 841 /* Ignored. */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 842 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 843
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 844 static void hvs_set_min_buffer_size(struct vsock_sock *vsk, u64 val)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 845 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 846 /* Ignored. */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 847 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 848
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 849 static void hvs_set_max_buffer_size(struct vsock_sock *vsk, u64 val)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 850 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 851 /* Ignored. */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 852 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 853
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 854 static u64 hvs_get_buffer_size(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 855 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 856 return -ENOPROTOOPT;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 857 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 858
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 859 static u64 hvs_get_min_buffer_size(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 860 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 861 return -ENOPROTOOPT;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 862 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 863
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 864 static u64 hvs_get_max_buffer_size(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 865 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 866 return -ENOPROTOOPT;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 867 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 868
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 869 static struct vsock_transport hvs_transport = {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 870 .get_local_cid = hvs_get_local_cid,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 871
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 872 .init = hvs_sock_init,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 873 .destruct = hvs_destruct,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 874 .release = hvs_release,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 875 .connect = hvs_connect,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 876 .shutdown = hvs_shutdown,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 877
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 878 .dgram_bind = hvs_dgram_bind,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 879 .dgram_dequeue = hvs_dgram_dequeue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 880 .dgram_enqueue = hvs_dgram_enqueue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 881 .dgram_allow = hvs_dgram_allow,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 882
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 883 .stream_dequeue = hvs_stream_dequeue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 884 .stream_enqueue = hvs_stream_enqueue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 885 .stream_has_data = hvs_stream_has_data,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 886 .stream_has_space = hvs_stream_has_space,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 887 .stream_rcvhiwat = hvs_stream_rcvhiwat,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 888 .stream_is_active = hvs_stream_is_active,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 889 .stream_allow = hvs_stream_allow,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 890
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 891 .notify_poll_in = hvs_notify_poll_in,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 892 .notify_poll_out = hvs_notify_poll_out,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 893 .notify_recv_init = hvs_notify_recv_init,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 894 .notify_recv_pre_block = hvs_notify_recv_pre_block,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 895 .notify_recv_pre_dequeue = hvs_notify_recv_pre_dequeue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 896 .notify_recv_post_dequeue = hvs_notify_recv_post_dequeue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 897 .notify_send_init = hvs_notify_send_init,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 898 .notify_send_pre_block = hvs_notify_send_pre_block,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 899 .notify_send_pre_enqueue = hvs_notify_send_pre_enqueue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 900 .notify_send_post_enqueue = hvs_notify_send_post_enqueue,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 901
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 902 .set_buffer_size = hvs_set_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 903 .set_min_buffer_size = hvs_set_min_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 904 .set_max_buffer_size = hvs_set_max_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 905 .get_buffer_size = hvs_get_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 906 .get_min_buffer_size = hvs_get_min_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 907 .get_max_buffer_size = hvs_get_max_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 908 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 909
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 910 static int hvs_probe(struct hv_device *hdev,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 911 const struct hv_vmbus_device_id *dev_id)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 912 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 913 struct vmbus_channel *chan = hdev->channel;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 914
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 915 hvs_open_connection(chan);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 916
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 917 /* Always return success to suppress the unnecessary error message
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 918 * in vmbus_probe(): on error the host will rescind the device in
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 919 * 30 seconds and we can do cleanup at that time in
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 920 * vmbus_onoffer_rescind().
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 921 */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 922 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 923 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 924
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 925 static int hvs_remove(struct hv_device *hdev)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 926 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 927 struct vmbus_channel *chan = hdev->channel;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 928
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 929 vmbus_close(chan);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 930
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 931 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 932 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 933
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 934 /* This isn't really used. See vmbus_match() and vmbus_probe() */
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 935 static const struct hv_vmbus_device_id id_table[] = {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 936 {},
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 937 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 938
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 939 static struct hv_driver hvs_drv = {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 940 .name = "hv_sock",
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 941 .hvsock = true,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 942 .id_table = id_table,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 943 .probe = hvs_probe,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 944 .remove = hvs_remove,
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 945 };
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 946
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 947 static int __init hvs_init(void)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 948 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 949 int ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 950
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 951 if (vmbus_proto_version < VERSION_WIN10)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 952 return -ENODEV;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 953
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 954 ret = vmbus_driver_register(&hvs_drv);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 955 if (ret != 0)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 956 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 957
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 958 ret = vsock_core_init(&hvs_transport);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 959 if (ret) {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 960 vmbus_driver_unregister(&hvs_drv);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 961 return ret;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 962 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 963
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 964 return 0;
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 965 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 966
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 967 static void __exit hvs_exit(void)
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 968 {
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 969 vsock_core_exit();
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 970 vmbus_driver_unregister(&hvs_drv);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 971 }
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 972
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 973 module_init(hvs_init);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 974 module_exit(hvs_exit);
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 975
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 976 MODULE_DESCRIPTION("Hyper-V Sockets");
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 977 MODULE_VERSION("1.0.0");
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 978 MODULE_LICENSE("GPL");
ae0078fcf0a5eb3 Dexuan Cui 2017-08-26 979 MODULE_ALIAS_NETPROTO(PF_VSOCK);
:::::: The code at line 214 was first introduced by commit
:::::: ae0078fcf0a5eb3a8623bfb5f988262e0911fdb9 hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
:::::: TO: Dexuan Cui <decui@microsoft.com>
:::::: CC: David S. Miller <davem@davemloft.net>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply
* Re: [PATCH v3 bpf-next 0/9] Revamp test_progs as a test running framework
From: Alexei Starovoitov @ 2019-07-28 5:41 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, Network Development, Alexei Starovoitov, Daniel Borkmann,
Stanislav Fomichev, Andrii Nakryiko, Kernel Team
In-Reply-To: <20190728032531.2358749-1-andriin@fb.com>
On Sat, Jul 27, 2019 at 8:25 PM Andrii Nakryiko <andriin@fb.com> wrote:
>
> This patch set makes a number of changes to test_progs selftest, which is
> a collection of many other tests (and sometimes sub-tests as well), to provide
> better testing experience and allow to start convering many individual test
> programs under selftests/bpf into a single and convenient test runner.
>
> Patch #1 fixes issue with Makefile, which makes prog_tests/test.h compiled as
> a C code. This fix allows to change how test.h is generated, providing ability
> to have more control on what and how tests are run.
>
> Patch #2 changes how test.h is auto-generated, which allows to have test
> definitions, instead of just running test functions. This gives ability to do
> more complicated test run policies.
>
> Patch #3 adds `-t <test-name>` and `-n <test-num>` selectors to run only
> subset of tests.
>
> Patch #4 changes libbpf_set_print() to return previously set print callback,
> allowing to temporarily replace current print callback and then set it back.
> This is necessary for some tests that want more control over libbpf logging.
>
> Patch #5 sets up and takes over libbpf logging from individual tests to
> test_prog runner, adding -vv verbosity to capture debug output from libbpf.
> This is useful when debugging failing tests.
>
> Patch #6 furthers test output management and buffers it by default, emitting
> log output only if test fails. This give succinct and clean default test
> output. It's possible to bypass this behavior with -v flag, which will turn
> off test output buffering.
>
> Patch #7 adds support for sub-tests. It also enhances -t and -n selectors to
> both support ability to specify sub-test selectors, as well as enhancing
> number selector to accept sets of test, instead of just individual test
> number.
>
> Patch #8 converts bpf_verif_scale.c test to use sub-test APIs.
>
> Patch #9 converts send_signal.c tests to use sub-test APIs.
>
> v2->v3:
> - fix buffered output rare unitialized value bug (Alexei);
> - fix buffered output va_list reuse bug (Alexei);
> - fix buffered output truncation due to interleaving zero terminators;
Looks great.
Applied. Thanks!
^ permalink raw reply
* [PATCH net-next v4 0/3] flow_offload: add indr-block in nf_table_offload
From: wenxu @ 2019-07-28 6:52 UTC (permalink / raw)
To: pablo, fw, jakub.kicinski; +Cc: netfilter-devel, netdev
From: wenxu <wenxu@ucloud.cn>
This series patch make nftables offload support the vlan and
tunnel device offload through indr-block architecture.
The first patch mv tc indr block to flow offload and rename
to flow-indr-block.
Because the new flow-indr-block can't get the tcf_block
directly. The second patch provide a callback to get tcf_block
immediately when the device register and contain a ingress block.
The third patch make nf_tables_offload support flow-indr-block.
wenxu (3):
flow_offload: move tc indirect block to flow offload
flow_offload: Support get default block from tc immediately
netfilter: nf_tables_offload: support indr block call
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +-
.../net/ethernet/netronome/nfp/flower/offload.c | 10 +-
include/net/flow_offload.h | 39 ++++
include/net/pkt_cls.h | 42 +---
include/net/sch_generic.h | 3 -
net/core/flow_offload.c | 181 +++++++++++++++
net/netfilter/nf_tables_offload.c | 131 +++++++++--
net/sched/cls_api.c | 246 ++++-----------------
8 files changed, 385 insertions(+), 277 deletions(-)
--
1.8.3.1
^ permalink raw reply
* [PATCH net-next v4 2/3] flow_offload: Support get default block from tc immediately
From: wenxu @ 2019-07-28 6:52 UTC (permalink / raw)
To: pablo, fw, jakub.kicinski; +Cc: netfilter-devel, netdev
In-Reply-To: <1564296769-32294-1-git-send-email-wenxu@ucloud.cn>
From: wenxu <wenxu@ucloud.cn>
When thre indr device register, it can get the default block
from tc immediately if the block is exist.
Signed-off-by: wenxu <wenxu@ucloud.cn>
---
v3: no change
v4: get tc default block without callback
include/net/pkt_cls.h | 7 +++++++
net/core/flow_offload.c | 2 ++
net/sched/cls_api.c | 33 +++++++++++++++++++++++++++++++++
3 files changed, 42 insertions(+)
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 0790a4e..77c3a42 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -54,6 +54,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
struct tcf_block_ext_info *ei);
+void tc_indr_get_default_block(struct flow_indr_block_dev *indr_dev);
+
static inline bool tcf_block_shared(struct tcf_block *block)
{
return block->index;
@@ -74,6 +76,11 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
struct tcf_result *res, bool compat_mode);
#else
+static inline
+void tc_indr_get_default_block(struct flow_indr_block_dev *indr_dev)
+{
+}
+
static inline bool tcf_block_shared(struct tcf_block *block)
{
return false;
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index 9f1ae67..0ca3d51 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -3,6 +3,7 @@
#include <linux/slab.h>
#include <net/flow_offload.h>
#include <linux/rtnetlink.h>
+#include <net/pkt_cls.h>
struct flow_rule *flow_rule_alloc(unsigned int num_actions)
{
@@ -312,6 +313,7 @@ static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *de
INIT_LIST_HEAD(&indr_dev->cb_list);
indr_dev->dev = dev;
+ tc_indr_get_default_block(indr_dev);
if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
flow_indr_setup_block_ht_params)) {
kfree(indr_dev);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index d551c56..59e9572 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -576,6 +576,39 @@ static void tc_indr_block_ing_cmd(struct net_device *dev,
tcf_block_setup(block, &bo);
}
+static struct tcf_block *tc_dev_ingress_block(struct net_device *dev)
+{
+ const struct Qdisc_class_ops *cops;
+ struct Qdisc *qdisc;
+
+ if (!dev_ingress_queue(dev))
+ return NULL;
+
+ qdisc = dev_ingress_queue(dev)->qdisc_sleeping;
+ if (!qdisc)
+ return NULL;
+
+ cops = qdisc->ops->cl_ops;
+ if (!cops)
+ return NULL;
+
+ if (!cops->tcf_block)
+ return NULL;
+
+ return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL);
+}
+
+void tc_indr_get_default_block(struct flow_indr_block_dev *indr_dev)
+{
+ struct tcf_block *block = tc_dev_ingress_block(indr_dev->dev);
+
+ if (block) {
+ indr_dev->flow_block = &block->flow_block;
+ indr_dev->ing_cmd_cb = tc_indr_block_ing_cmd;
+ }
+}
+EXPORT_SYMBOL(tc_indr_get_default_block);
+
static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
struct tcf_block_ext_info *ei,
enum flow_block_command command,
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next v4 3/3] netfilter: nf_tables_offload: support indr block call
From: wenxu @ 2019-07-28 6:52 UTC (permalink / raw)
To: pablo, fw, jakub.kicinski; +Cc: netfilter-devel, netdev
In-Reply-To: <1564296769-32294-1-git-send-email-wenxu@ucloud.cn>
From: wenxu <wenxu@ucloud.cn>
nftable support indr-block call. It makes nftable an offload vlan
and tunnel device.
nft add table netdev firewall
nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; }
nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0
nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; }
nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0
Signed-off-by: wenxu <wenxu@ucloud.cn>
---
v3: subsys_initcall for init_flow_indr_rhashtable
v4: guarantee only one offload base chain used per indr dev.
If the indr_block_cmd bind fail return unsupported.
net/netfilter/nf_tables_offload.c | 131 +++++++++++++++++++++++++++++++-------
1 file changed, 107 insertions(+), 24 deletions(-)
diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
index 64f5fd5..19214ad 100644
--- a/net/netfilter/nf_tables_offload.c
+++ b/net/netfilter/nf_tables_offload.c
@@ -171,24 +171,123 @@ static int nft_flow_offload_unbind(struct flow_block_offload *bo,
return 0;
}
+static int nft_block_setup(struct nft_base_chain *basechain,
+ struct flow_block_offload *bo,
+ enum flow_block_command cmd)
+{
+ int err;
+
+ switch (cmd) {
+ case FLOW_BLOCK_BIND:
+ err = nft_flow_offload_bind(bo, basechain);
+ break;
+ case FLOW_BLOCK_UNBIND:
+ err = nft_flow_offload_unbind(bo, basechain);
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ err = -EOPNOTSUPP;
+ }
+
+ return err;
+}
+
+static int nft_block_offload_cmd(struct nft_base_chain *chain,
+ struct net_device *dev,
+ enum flow_block_command cmd)
+{
+ struct netlink_ext_ack extack = {};
+ struct flow_block_offload bo = {};
+ int err;
+
+ bo.net = dev_net(dev);
+ bo.block = &chain->flow_block;
+ bo.command = cmd;
+ bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+ bo.extack = &extack;
+ INIT_LIST_HEAD(&bo.cb_list);
+
+ err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
+ if (err < 0)
+ return err;
+
+ return nft_block_setup(chain, &bo, cmd);
+}
+
+static void nft_indr_block_ing_cmd(struct net_device *dev,
+ struct flow_block *flow_block,
+ struct flow_indr_block_cb *indr_block_cb,
+ enum flow_block_command cmd)
+{
+ struct netlink_ext_ack extack = {};
+ struct flow_block_offload bo = {};
+ struct nft_base_chain *chain;
+
+ if (flow_block)
+ return;
+
+ chain = container_of(flow_block, struct nft_base_chain, flow_block);
+
+ bo.net = dev_net(dev);
+ bo.block = flow_block;
+ bo.command = cmd;
+ bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+ bo.extack = &extack;
+ INIT_LIST_HEAD(&bo.cb_list);
+
+ indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, &bo);
+
+ nft_block_setup(chain, &bo, cmd);
+}
+
+static int nft_indr_block_offload_cmd(struct nft_base_chain *chain,
+ struct net_device *dev,
+ enum flow_block_command cmd)
+{
+ struct flow_indr_block_cb *indr_block_cb;
+ struct flow_indr_block_dev *indr_dev;
+ struct flow_block_offload bo = {};
+ struct netlink_ext_ack extack = {};
+
+ bo.net = dev_net(dev);
+ bo.block = &chain->flow_block;
+ bo.command = cmd;
+ bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+ bo.extack = &extack;
+ INIT_LIST_HEAD(&bo.cb_list);
+
+ indr_dev = flow_indr_block_dev_lookup(dev);
+ if (!indr_dev)
+ return -EOPNOTSUPP;
+
+ indr_dev->flow_block = cmd == FLOW_BLOCK_BIND ? &chain->flow_block : NULL;
+ indr_dev->ing_cmd_cb = cmd == FLOW_BLOCK_BIND ? nft_indr_block_ing_cmd : NULL;
+
+ list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
+ indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
+ &bo);
+
+ if (list_empty(&bo.cb_list))
+ return -EOPNOTSUPP;
+
+ return nft_block_setup(chain, &bo, cmd);
+}
+
#define FLOW_SETUP_BLOCK TC_SETUP_BLOCK
static int nft_flow_offload_chain(struct nft_trans *trans,
enum flow_block_command cmd)
{
struct nft_chain *chain = trans->ctx.chain;
- struct netlink_ext_ack extack = {};
- struct flow_block_offload bo = {};
struct nft_base_chain *basechain;
struct net_device *dev;
- int err;
if (!nft_is_base_chain(chain))
return -EOPNOTSUPP;
basechain = nft_base_chain(chain);
dev = basechain->ops.dev;
- if (!dev || !dev->netdev_ops->ndo_setup_tc)
+ if (!dev)
return -EOPNOTSUPP;
/* Only default policy to accept is supported for now. */
@@ -197,26 +296,10 @@ static int nft_flow_offload_chain(struct nft_trans *trans,
nft_trans_chain_policy(trans) != NF_ACCEPT)
return -EOPNOTSUPP;
- bo.command = cmd;
- bo.block = &basechain->flow_block;
- bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
- bo.extack = &extack;
- INIT_LIST_HEAD(&bo.cb_list);
-
- err = dev->netdev_ops->ndo_setup_tc(dev, FLOW_SETUP_BLOCK, &bo);
- if (err < 0)
- return err;
-
- switch (cmd) {
- case FLOW_BLOCK_BIND:
- err = nft_flow_offload_bind(&bo, basechain);
- break;
- case FLOW_BLOCK_UNBIND:
- err = nft_flow_offload_unbind(&bo, basechain);
- break;
- }
-
- return err;
+ if (dev->netdev_ops->ndo_setup_tc)
+ return nft_block_offload_cmd(basechain, dev, cmd);
+ else
+ return nft_indr_block_offload_cmd(basechain, dev, cmd);
}
int nft_flow_rule_offload_commit(struct net *net)
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next v4 1/3] flow_offload: move tc indirect block to flow offload
From: wenxu @ 2019-07-28 6:52 UTC (permalink / raw)
To: pablo, fw, jakub.kicinski; +Cc: netfilter-devel, netdev
In-Reply-To: <1564296769-32294-1-git-send-email-wenxu@ucloud.cn>
From: wenxu <wenxu@ucloud.cn>
move tc indirect block to flow_offload and rename
it to flow indirect block.The nf_tables can use the
indr block architecture.
Signed-off-by: wenxu <wenxu@ucloud.cn>
---
v3: subsys_initcall for init_flow_indr_rhashtable
v4: no change
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 10 +-
.../net/ethernet/netronome/nfp/flower/offload.c | 10 +-
include/net/flow_offload.h | 39 ++++
include/net/pkt_cls.h | 35 ---
include/net/sch_generic.h | 3 -
net/core/flow_offload.c | 179 ++++++++++++++++
net/sched/cls_api.c | 235 ++-------------------
7 files changed, 247 insertions(+), 264 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 7f747cb..074573b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -785,9 +785,9 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv,
{
int err;
- err = __tc_indr_block_cb_register(netdev, rpriv,
- mlx5e_rep_indr_setup_tc_cb,
- rpriv);
+ err = __flow_indr_block_cb_register(netdev, rpriv,
+ mlx5e_rep_indr_setup_tc_cb,
+ rpriv);
if (err) {
struct mlx5e_priv *priv = netdev_priv(rpriv->netdev);
@@ -800,8 +800,8 @@ static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv,
static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv,
struct net_device *netdev)
{
- __tc_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb,
- rpriv);
+ __flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_tc_cb,
+ rpriv);
}
static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index e209f15..6a0f034 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -1479,16 +1479,16 @@ int nfp_flower_reg_indir_block_handler(struct nfp_app *app,
return NOTIFY_OK;
if (event == NETDEV_REGISTER) {
- err = __tc_indr_block_cb_register(netdev, app,
- nfp_flower_indr_setup_tc_cb,
- app);
+ err = __flow_indr_block_cb_register(netdev, app,
+ nfp_flower_indr_setup_tc_cb,
+ app);
if (err)
nfp_flower_cmsg_warn(app,
"Indirect block reg failed - %s\n",
netdev->name);
} else if (event == NETDEV_UNREGISTER) {
- __tc_indr_block_cb_unregister(netdev,
- nfp_flower_indr_setup_tc_cb, app);
+ __flow_indr_block_cb_unregister(netdev,
+ nfp_flower_indr_setup_tc_cb, app);
}
return NOTIFY_OK;
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 00b9aab..66f89bc 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -4,6 +4,7 @@
#include <linux/kernel.h>
#include <linux/list.h>
#include <net/flow_dissector.h>
+#include <linux/rhashtable.h>
struct flow_match {
struct flow_dissector *dissector;
@@ -366,4 +367,42 @@ static inline void flow_block_init(struct flow_block *flow_block)
INIT_LIST_HEAD(&flow_block->cb_list);
}
+typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
+ enum tc_setup_type type, void *type_data);
+
+struct flow_indr_block_cb {
+ struct list_head list;
+ void *cb_priv;
+ flow_indr_block_bind_cb_t *cb;
+ void *cb_ident;
+};
+
+typedef void flow_indr_block_ing_cmd_t(struct net_device *dev,
+ struct flow_block *flow_block,
+ struct flow_indr_block_cb *indr_block_cb,
+ enum flow_block_command command);
+
+struct flow_indr_block_dev {
+ struct rhash_head ht_node;
+ struct net_device *dev;
+ unsigned int refcnt;
+ struct list_head cb_list;
+ flow_indr_block_ing_cmd_t *ing_cmd_cb;
+ struct flow_block *flow_block;
+};
+
+struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device *dev);
+
+int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ flow_indr_block_bind_cb_t *cb, void *cb_ident);
+
+void __flow_indr_block_cb_unregister(struct net_device *dev,
+ flow_indr_block_bind_cb_t *cb, void *cb_ident);
+
+int flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ flow_indr_block_bind_cb_t *cb, void *cb_ident);
+
+void flow_indr_block_cb_unregister(struct net_device *dev,
+ flow_indr_block_bind_cb_t *cb, void *cb_ident);
+
#endif /* _NET_FLOW_OFFLOAD_H */
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index e429809..0790a4e 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -70,15 +70,6 @@ static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
return block->q;
}
-int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
- tc_indr_block_bind_cb_t *cb, void *cb_ident);
-int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
- tc_indr_block_bind_cb_t *cb, void *cb_ident);
-void __tc_indr_block_cb_unregister(struct net_device *dev,
- tc_indr_block_bind_cb_t *cb, void *cb_ident);
-void tc_indr_block_cb_unregister(struct net_device *dev,
- tc_indr_block_bind_cb_t *cb, void *cb_ident);
-
int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
struct tcf_result *res, bool compat_mode);
@@ -137,32 +128,6 @@ void tc_setup_cb_block_unregister(struct tcf_block *block, flow_setup_cb_t *cb,
{
}
-static inline
-int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
- tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
- return 0;
-}
-
-static inline
-int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
- tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
- return 0;
-}
-
-static inline
-void __tc_indr_block_cb_unregister(struct net_device *dev,
- tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-}
-
-static inline
-void tc_indr_block_cb_unregister(struct net_device *dev,
- tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
-}
-
static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
struct tcf_result *res, bool compat_mode)
{
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 6b6b012..d9f359a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -23,9 +23,6 @@
struct module;
struct bpf_flow_keys;
-typedef int tc_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
- enum tc_setup_type type, void *type_data);
-
struct qdisc_rate_table {
struct tc_ratespec rate;
u32 data[256];
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index d63b970..9f1ae67 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -2,6 +2,7 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <net/flow_offload.h>
+#include <linux/rtnetlink.h>
struct flow_rule *flow_rule_alloc(unsigned int num_actions)
{
@@ -280,3 +281,181 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f,
}
}
EXPORT_SYMBOL(flow_block_cb_setup_simple);
+
+static struct rhashtable indr_setup_block_ht;
+
+static const struct rhashtable_params flow_indr_setup_block_ht_params = {
+ .key_offset = offsetof(struct flow_indr_block_dev, dev),
+ .head_offset = offsetof(struct flow_indr_block_dev, ht_node),
+ .key_len = sizeof(struct net_device *),
+};
+
+struct flow_indr_block_dev *
+flow_indr_block_dev_lookup(struct net_device *dev)
+{
+ return rhashtable_lookup_fast(&indr_setup_block_ht, &dev,
+ flow_indr_setup_block_ht_params);
+}
+EXPORT_SYMBOL(flow_indr_block_dev_lookup);
+
+static struct flow_indr_block_dev *flow_indr_block_dev_get(struct net_device *dev)
+{
+ struct flow_indr_block_dev *indr_dev;
+
+ indr_dev = flow_indr_block_dev_lookup(dev);
+ if (indr_dev)
+ goto inc_ref;
+
+ indr_dev = kzalloc(sizeof(*indr_dev), GFP_KERNEL);
+ if (!indr_dev)
+ return NULL;
+
+ INIT_LIST_HEAD(&indr_dev->cb_list);
+ indr_dev->dev = dev;
+ if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
+ flow_indr_setup_block_ht_params)) {
+ kfree(indr_dev);
+ return NULL;
+ }
+
+inc_ref:
+ indr_dev->refcnt++;
+ return indr_dev;
+}
+
+static void flow_indr_block_dev_put(struct flow_indr_block_dev *indr_dev)
+{
+ if (--indr_dev->refcnt)
+ return;
+
+ rhashtable_remove_fast(&indr_setup_block_ht, &indr_dev->ht_node,
+ flow_indr_setup_block_ht_params);
+ kfree(indr_dev);
+}
+
+static struct flow_indr_block_cb *
+flow_indr_block_cb_lookup(struct flow_indr_block_dev *indr_dev,
+ flow_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+ struct flow_indr_block_cb *indr_block_cb;
+
+ list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
+ if (indr_block_cb->cb == cb &&
+ indr_block_cb->cb_ident == cb_ident)
+ return indr_block_cb;
+ return NULL;
+}
+
+static struct flow_indr_block_cb *
+flow_indr_block_cb_add(struct flow_indr_block_dev *indr_dev, void *cb_priv,
+ flow_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+ struct flow_indr_block_cb *indr_block_cb;
+
+ indr_block_cb = flow_indr_block_cb_lookup(indr_dev, cb, cb_ident);
+ if (indr_block_cb)
+ return ERR_PTR(-EEXIST);
+
+ indr_block_cb = kzalloc(sizeof(*indr_block_cb), GFP_KERNEL);
+ if (!indr_block_cb)
+ return ERR_PTR(-ENOMEM);
+
+ indr_block_cb->cb_priv = cb_priv;
+ indr_block_cb->cb = cb;
+ indr_block_cb->cb_ident = cb_ident;
+ list_add(&indr_block_cb->list, &indr_dev->cb_list);
+
+ return indr_block_cb;
+}
+
+static void flow_indr_block_cb_del(struct flow_indr_block_cb *indr_block_cb)
+{
+ list_del(&indr_block_cb->list);
+ kfree(indr_block_cb);
+}
+
+int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ flow_indr_block_bind_cb_t *cb,
+ void *cb_ident)
+{
+ struct flow_indr_block_cb *indr_block_cb;
+ struct flow_indr_block_dev *indr_dev;
+ int err;
+
+ indr_dev = flow_indr_block_dev_get(dev);
+ if (!indr_dev)
+ return -ENOMEM;
+
+ indr_block_cb = flow_indr_block_cb_add(indr_dev, cb_priv, cb, cb_ident);
+ err = PTR_ERR_OR_ZERO(indr_block_cb);
+ if (err)
+ goto err_dev_put;
+
+ if (indr_dev->ing_cmd_cb)
+ indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block, indr_block_cb,
+ FLOW_BLOCK_BIND);
+
+ return 0;
+
+err_dev_put:
+ flow_indr_block_dev_put(indr_dev);
+ return err;
+}
+EXPORT_SYMBOL_GPL(__flow_indr_block_cb_register);
+
+int flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ flow_indr_block_bind_cb_t *cb,
+ void *cb_ident)
+{
+ int err;
+
+ rtnl_lock();
+ err = __flow_indr_block_cb_register(dev, cb_priv, cb, cb_ident);
+ rtnl_unlock();
+
+ return err;
+}
+EXPORT_SYMBOL_GPL(flow_indr_block_cb_register);
+
+void __flow_indr_block_cb_unregister(struct net_device *dev,
+ flow_indr_block_bind_cb_t *cb,
+ void *cb_ident)
+{
+ struct flow_indr_block_cb *indr_block_cb;
+ struct flow_indr_block_dev *indr_dev;
+
+ indr_dev = flow_indr_block_dev_lookup(dev);
+ if (!indr_dev)
+ return;
+
+ indr_block_cb = flow_indr_block_cb_lookup(indr_dev, cb, cb_ident);
+ if (!indr_block_cb)
+ return;
+
+ /* Send unbind message if required to free any block cbs. */
+ if (indr_dev->ing_cmd_cb)
+ indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block,
+ indr_block_cb,
+ FLOW_BLOCK_UNBIND);
+
+ flow_indr_block_cb_del(indr_block_cb);
+ flow_indr_block_dev_put(indr_dev);
+}
+EXPORT_SYMBOL_GPL(__flow_indr_block_cb_unregister);
+
+void flow_indr_block_cb_unregister(struct net_device *dev,
+ flow_indr_block_bind_cb_t *cb,
+ void *cb_ident)
+{
+ rtnl_lock();
+ __flow_indr_block_cb_unregister(dev, cb, cb_ident);
+ rtnl_unlock();
+}
+EXPORT_SYMBOL_GPL(flow_indr_block_cb_unregister);
+
+static int __init init_flow_indr_rhashtable(void)
+{
+ return rhashtable_init(&indr_setup_block_ht,
+ &flow_indr_setup_block_ht_params);
+}
+subsys_initcall(init_flow_indr_rhashtable);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 3565d9a..d551c56 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -37,6 +37,7 @@
#include <net/tc_act/tc_skbedit.h>
#include <net/tc_act/tc_ct.h>
#include <net/tc_act/tc_mpls.h>
+#include <net/flow_offload.h>
extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
@@ -545,235 +546,43 @@ static void tcf_chain_flush(struct tcf_chain *chain, bool rtnl_held)
}
}
-static struct tcf_block *tc_dev_ingress_block(struct net_device *dev)
-{
- const struct Qdisc_class_ops *cops;
- struct Qdisc *qdisc;
-
- if (!dev_ingress_queue(dev))
- return NULL;
-
- qdisc = dev_ingress_queue(dev)->qdisc_sleeping;
- if (!qdisc)
- return NULL;
-
- cops = qdisc->ops->cl_ops;
- if (!cops)
- return NULL;
-
- if (!cops->tcf_block)
- return NULL;
-
- return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL);
-}
-
-static struct rhashtable indr_setup_block_ht;
-
-struct tc_indr_block_dev {
- struct rhash_head ht_node;
- struct net_device *dev;
- unsigned int refcnt;
- struct list_head cb_list;
- struct tcf_block *block;
-};
-
-struct tc_indr_block_cb {
- struct list_head list;
- void *cb_priv;
- tc_indr_block_bind_cb_t *cb;
- void *cb_ident;
-};
-
-static const struct rhashtable_params tc_indr_setup_block_ht_params = {
- .key_offset = offsetof(struct tc_indr_block_dev, dev),
- .head_offset = offsetof(struct tc_indr_block_dev, ht_node),
- .key_len = sizeof(struct net_device *),
-};
-
-static struct tc_indr_block_dev *
-tc_indr_block_dev_lookup(struct net_device *dev)
-{
- return rhashtable_lookup_fast(&indr_setup_block_ht, &dev,
- tc_indr_setup_block_ht_params);
-}
-
-static struct tc_indr_block_dev *tc_indr_block_dev_get(struct net_device *dev)
-{
- struct tc_indr_block_dev *indr_dev;
-
- indr_dev = tc_indr_block_dev_lookup(dev);
- if (indr_dev)
- goto inc_ref;
-
- indr_dev = kzalloc(sizeof(*indr_dev), GFP_KERNEL);
- if (!indr_dev)
- return NULL;
-
- INIT_LIST_HEAD(&indr_dev->cb_list);
- indr_dev->dev = dev;
- indr_dev->block = tc_dev_ingress_block(dev);
- if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
- tc_indr_setup_block_ht_params)) {
- kfree(indr_dev);
- return NULL;
- }
-
-inc_ref:
- indr_dev->refcnt++;
- return indr_dev;
-}
-
-static void tc_indr_block_dev_put(struct tc_indr_block_dev *indr_dev)
-{
- if (--indr_dev->refcnt)
- return;
-
- rhashtable_remove_fast(&indr_setup_block_ht, &indr_dev->ht_node,
- tc_indr_setup_block_ht_params);
- kfree(indr_dev);
-}
-
-static struct tc_indr_block_cb *
-tc_indr_block_cb_lookup(struct tc_indr_block_dev *indr_dev,
- tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
- struct tc_indr_block_cb *indr_block_cb;
-
- list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
- if (indr_block_cb->cb == cb &&
- indr_block_cb->cb_ident == cb_ident)
- return indr_block_cb;
- return NULL;
-}
-
-static struct tc_indr_block_cb *
-tc_indr_block_cb_add(struct tc_indr_block_dev *indr_dev, void *cb_priv,
- tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
- struct tc_indr_block_cb *indr_block_cb;
-
- indr_block_cb = tc_indr_block_cb_lookup(indr_dev, cb, cb_ident);
- if (indr_block_cb)
- return ERR_PTR(-EEXIST);
-
- indr_block_cb = kzalloc(sizeof(*indr_block_cb), GFP_KERNEL);
- if (!indr_block_cb)
- return ERR_PTR(-ENOMEM);
-
- indr_block_cb->cb_priv = cb_priv;
- indr_block_cb->cb = cb;
- indr_block_cb->cb_ident = cb_ident;
- list_add(&indr_block_cb->list, &indr_dev->cb_list);
-
- return indr_block_cb;
-}
-
-static void tc_indr_block_cb_del(struct tc_indr_block_cb *indr_block_cb)
-{
- list_del(&indr_block_cb->list);
- kfree(indr_block_cb);
-}
-
static int tcf_block_setup(struct tcf_block *block,
struct flow_block_offload *bo);
-static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
- struct tc_indr_block_cb *indr_block_cb,
+static void tc_indr_block_ing_cmd(struct net_device *dev,
+ struct flow_block *flow_block,
+ struct flow_indr_block_cb *indr_block_cb,
enum flow_block_command command)
{
+ struct tcf_block *block = flow_block ?
+ container_of(flow_block,
+ struct tcf_block,
+ flow_block) : NULL;
struct flow_block_offload bo = {
.command = command,
.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
- .net = dev_net(indr_dev->dev),
- .block_shared = tcf_block_non_null_shared(indr_dev->block),
+ .net = dev_net(dev),
+ .block_shared = tcf_block_non_null_shared(block),
};
INIT_LIST_HEAD(&bo.cb_list);
- if (!indr_dev->block)
- return;
-
- bo.block = &indr_dev->block->flow_block;
-
- indr_block_cb->cb(indr_dev->dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
- &bo);
- tcf_block_setup(indr_dev->block, &bo);
-}
-
-int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
- tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
- struct tc_indr_block_cb *indr_block_cb;
- struct tc_indr_block_dev *indr_dev;
- int err;
-
- indr_dev = tc_indr_block_dev_get(dev);
- if (!indr_dev)
- return -ENOMEM;
-
- indr_block_cb = tc_indr_block_cb_add(indr_dev, cb_priv, cb, cb_ident);
- err = PTR_ERR_OR_ZERO(indr_block_cb);
- if (err)
- goto err_dev_put;
-
- tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_BIND);
- return 0;
-
-err_dev_put:
- tc_indr_block_dev_put(indr_dev);
- return err;
-}
-EXPORT_SYMBOL_GPL(__tc_indr_block_cb_register);
-
-int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
- tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
- int err;
-
- rtnl_lock();
- err = __tc_indr_block_cb_register(dev, cb_priv, cb, cb_ident);
- rtnl_unlock();
-
- return err;
-}
-EXPORT_SYMBOL_GPL(tc_indr_block_cb_register);
-
-void __tc_indr_block_cb_unregister(struct net_device *dev,
- tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
- struct tc_indr_block_cb *indr_block_cb;
- struct tc_indr_block_dev *indr_dev;
-
- indr_dev = tc_indr_block_dev_lookup(dev);
- if (!indr_dev)
+ if (!block)
return;
- indr_block_cb = tc_indr_block_cb_lookup(indr_dev, cb, cb_ident);
- if (!indr_block_cb)
- return;
+ bo.block = flow_block;
- /* Send unbind message if required to free any block cbs. */
- tc_indr_block_ing_cmd(indr_dev, indr_block_cb, FLOW_BLOCK_UNBIND);
- tc_indr_block_cb_del(indr_block_cb);
- tc_indr_block_dev_put(indr_dev);
-}
-EXPORT_SYMBOL_GPL(__tc_indr_block_cb_unregister);
+ indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK, &bo);
-void tc_indr_block_cb_unregister(struct net_device *dev,
- tc_indr_block_bind_cb_t *cb, void *cb_ident)
-{
- rtnl_lock();
- __tc_indr_block_cb_unregister(dev, cb, cb_ident);
- rtnl_unlock();
+ tcf_block_setup(block, &bo);
}
-EXPORT_SYMBOL_GPL(tc_indr_block_cb_unregister);
static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
struct tcf_block_ext_info *ei,
enum flow_block_command command,
struct netlink_ext_ack *extack)
{
- struct tc_indr_block_cb *indr_block_cb;
- struct tc_indr_block_dev *indr_dev;
+ struct flow_indr_block_cb *indr_block_cb;
+ struct flow_indr_block_dev *indr_dev;
struct flow_block_offload bo = {
.command = command,
.binder_type = ei->binder_type,
@@ -784,11 +593,12 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
};
INIT_LIST_HEAD(&bo.cb_list);
- indr_dev = tc_indr_block_dev_lookup(dev);
+ indr_dev = flow_indr_block_dev_lookup(dev);
if (!indr_dev)
return;
- indr_dev->block = command == FLOW_BLOCK_BIND ? block : NULL;
+ indr_dev->flow_block = command == FLOW_BLOCK_BIND ? &block->flow_block : NULL;
+ indr_dev->ing_cmd_cb = command == FLOW_BLOCK_BIND ? tc_indr_block_ing_cmd : NULL;
list_for_each_entry(indr_block_cb, &indr_dev->cb_list, list)
indr_block_cb->cb(dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
@@ -3358,11 +3168,6 @@ static int __init tc_filter_init(void)
if (err)
goto err_register_pernet_subsys;
- err = rhashtable_init(&indr_setup_block_ht,
- &tc_indr_setup_block_ht_params);
- if (err)
- goto err_rhash_setup_block_ht;
-
rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
RTNL_FLAG_DOIT_UNLOCKED);
rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
@@ -3376,8 +3181,6 @@ static int __init tc_filter_init(void)
return 0;
-err_rhash_setup_block_ht:
- unregister_pernet_subsys(&tcf_net_ops);
err_register_pernet_subsys:
destroy_workqueue(tc_filter_wq);
return err;
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH] rocker: fix memory leaks of fib_work on two error return paths
From: Jiri Pirko @ 2019-07-28 7:46 UTC (permalink / raw)
To: Colin King
Cc: David Ahern, David S . Miller, netdev, kernel-janitors,
linux-kernel
In-Reply-To: <20190727233726.3121-1-colin.king@canonical.com>
Sun, Jul 28, 2019 at 01:37:26AM CEST, colin.king@canonical.com wrote:
>From: Colin Ian King <colin.king@canonical.com>
>
>Currently there are two error return paths that leak memory allocated
>to fib_work. Fix this by kfree'ing fib_work before returning.
>
>Addresses-Coverity: ("Resource leak")
>Fixes: 19a9d136f198 ("ipv4: Flag fib_info with a fib_nh using IPv6 gateway")
>Fixes: dbcc4fa718ee ("rocker: Fail attempts to use routes with nexthop objects")
>Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
^ permalink raw reply
* Re: INFO: rcu detected stall in vhost_worker
From: Michael S. Tsirkin @ 2019-07-28 8:36 UTC (permalink / raw)
To: Hillf Danton
Cc: syzbot, jasowang, kvm, linux-kbuild, linux-kernel, michal.lkml,
netdev, syzkaller-bugs, torvalds, virtualization, yamada.masahiro
In-Reply-To: <000000000000e87d14058e9728d7@google.com>
On Sat, Jul 27, 2019 at 04:23:23PM +0800, Hillf Danton wrote:
>
> Fri, 26 Jul 2019 08:26:01 -0700 (PDT)
> > syzbot has bisected this bug to:
> >
> > commit 0ecfebd2b52404ae0c54a878c872bb93363ada36
> > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > Date: Sun Jul 7 22:41:56 2019 +0000
> >
> > Linux 5.2
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=118810bfa00000
> > start commit: 13bf6d6a Add linux-next specific files for 20190725
> > git tree: linux-next
> > kernel config: https://syzkaller.appspot.com/x/.config?x=8ae987d803395886
> > dashboard link: https://syzkaller.appspot.com/bug?extid=36e93b425cd6eb54fcc1
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15112f3fa00000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=131ab578600000
> >
> > Reported-by: syzbot+36e93b425cd6eb54fcc1@syzkaller.appspotmail.com
> > Fixes: 0ecfebd2b524 ("Linux 5.2")
> >
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -787,7 +787,6 @@ static void vhost_setup_uaddr(struct vho
> size_t size, bool write)
> {
> struct vhost_uaddr *addr = &vq->uaddrs[index];
> - spin_lock(&vq->mmu_lock);
>
> addr->uaddr = uaddr;
> addr->size = size;
> @@ -797,7 +796,10 @@ static void vhost_setup_uaddr(struct vho
> static void vhost_setup_vq_uaddr(struct vhost_virtqueue *vq)
> {
> spin_lock(&vq->mmu_lock);
> -
> + /*
> + * deadlock if managing to take mmu_lock again while
> + * setting up uaddr
> + */
> vhost_setup_uaddr(vq, VHOST_ADDR_DESC,
> (unsigned long)vq->desc,
> vhost_get_desc_size(vq, vq->num),
> --
Thanks!
I reverted this whole commit.
--
MST
^ permalink raw reply
* Re: [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Gal Pressman @ 2019-07-28 8:45 UTC (permalink / raw)
To: Jason Gunthorpe, Michal Kalderon
Cc: Kamal Heib, Ariel Elior, dledford@redhat.com,
linux-rdma@vger.kernel.org, davem@davemloft.net,
netdev@vger.kernel.org
In-Reply-To: <20190726132316.GA8695@ziepe.ca>
On 26/07/2019 16:23, Jason Gunthorpe wrote:
> On Fri, Jul 26, 2019 at 08:42:07AM +0000, Michal Kalderon wrote:
>
>>>> But we don't free entires from the xa_array ( only when ucontext is
>>>> destroyed) so how will There be an empty element after we wrap ?
>>>
>>> Oh!
>>>
>>> That should be fixed up too, in the general case if a user is
>>> creating/destroying driver objects in loop we don't want memory usage to
>>> be unbounded.
>>>
>>> The rdma_user_mmap stuff has VMA ops that can refcount the xa entry and
>>> now that this is core code it is easy enough to harmonize the two things and
>>> track the xa side from the struct rdma_umap_priv
>>>
>>> The question is, does EFA or qedr have a use model for this that allows a
>>> userspace verb to create/destroy in a loop? ie do we need to fix this right
>>> now?
>
>> The mapping occurs for every qp and cq creation. So yes.
>>
>> So do you mean add a ref-cnt to the xarray entry and from umap
>> decrease the refcnt and free?
>
> Yes, free the entry (release the HW resource) and release the xa_array
> ID.
This is a bit tricky for EFA.
The UAR BAR resources (LLQ for example) aren't cleaned up until the UAR is
deallocated, so many of the entries won't really be freed when the refcount
reaches zero (i.e the HW considers these entries as refcounted as long as the
UAR exists). The best we can do is free the DMA buffers for appropriate entries.
^ permalink raw reply
* [PATCH net-next] r8169: make use of xmit_more
From: Heiner Kallweit @ 2019-07-28 9:25 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller
Cc: netdev@vger.kernel.org, Sander Eikelenboom, Eric Dumazet
There was a previous attempt to use xmit_more, but the change had to be
reverted because under load sometimes a transmit timeout occurred [0].
Maybe this was caused by a missing memory barrier, the new attempt
keeps the memory barrier before the call to netif_stop_queue like it
is used by the driver as of today. The new attempt also changes the
order of some calls as suggested by Eric.
[0] https://lkml.org/lkml/2019/2/10/39
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 864ca529d..d9261e68f 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5637,6 +5637,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
struct device *d = tp_to_dev(tp);
dma_addr_t mapping;
u32 opts[2], len;
+ bool stop_queue;
+ bool door_bell;
int frags;
if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
@@ -5680,13 +5682,13 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
txd->opts2 = cpu_to_le32(opts[1]);
- netdev_sent_queue(dev, skb->len);
-
skb_tx_timestamp(skb);
/* Force memory writes to complete before releasing descriptor */
dma_wmb();
+ door_bell = __netdev_sent_queue(dev, skb->len, netdev_xmit_more());
+
txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry);
/* Force all memory writes to complete before notifying device */
@@ -5694,14 +5696,19 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
tp->cur_tx += frags + 1;
- RTL_W8(tp, TxPoll, NPQ);
-
- if (!rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) {
+ stop_queue = !rtl_tx_slots_avail(tp, MAX_SKB_FRAGS);
+ if (unlikely(stop_queue)) {
/* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
* not miss a ring update when it notices a stopped queue.
*/
smp_wmb();
netif_stop_queue(dev);
+ }
+
+ if (door_bell)
+ RTL_W8(tp, TxPoll, NPQ);
+
+ if (unlikely(stop_queue)) {
/* Sync with rtl_tx:
* - publish queue status and cur_tx ring index (write barrier)
* - refresh dirty_tx ring index (read barrier).
--
2.22.0
^ permalink raw reply related
* Re: [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Kamal Heib @ 2019-07-28 9:30 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Michal Kalderon, ariel.elior, dledford, galpress, linux-rdma,
davem, netdev
In-Reply-To: <20190725175540.GA18757@ziepe.ca>
On Thu, Jul 25, 2019 at 02:55:40PM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 09, 2019 at 05:17:30PM +0300, Michal Kalderon wrote:
> > Create some common API's for adding entries to a xa_mmap.
> > Searching for an entry and freeing one.
> >
> > The code was copied from the efa driver almost as is, just renamed
> > function to be generic and not efa specific.
> >
> > Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
> > Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> > drivers/infiniband/core/device.c | 1 +
> > drivers/infiniband/core/rdma_core.c | 1 +
> > drivers/infiniband/core/uverbs_cmd.c | 1 +
> > drivers/infiniband/core/uverbs_main.c | 135 ++++++++++++++++++++++++++++++++++
> > include/rdma/ib_verbs.h | 46 ++++++++++++
> > 5 files changed, 184 insertions(+)
> >
> > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> > index 8a6ccb936dfe..a830c2c5d691 100644
> > +++ b/drivers/infiniband/core/device.c
> > @@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
> > SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
> > SET_DEVICE_OP(dev_ops, map_phys_fmr);
> > SET_DEVICE_OP(dev_ops, mmap);
> > + SET_DEVICE_OP(dev_ops, mmap_free);
> > SET_DEVICE_OP(dev_ops, modify_ah);
> > SET_DEVICE_OP(dev_ops, modify_cq);
> > SET_DEVICE_OP(dev_ops, modify_device);
> > diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> > index ccf4d069c25c..1ed01b02401f 100644
> > +++ b/drivers/infiniband/core/rdma_core.c
> > @@ -816,6 +816,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
> >
> > rdma_restrack_del(&ucontext->res);
> >
> > + rdma_user_mmap_entries_remove_free(ucontext);
> > ib_dev->ops.dealloc_ucontext(ucontext);
> > kfree(ucontext);
> >
> > diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> > index 7ddd0e5bc6b3..44c0600245e4 100644
> > +++ b/drivers/infiniband/core/uverbs_cmd.c
> > @@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
> >
> > mutex_init(&ucontext->per_mm_list_lock);
> > INIT_LIST_HEAD(&ucontext->per_mm_list);
> > + xa_init(&ucontext->mmap_xa);
> >
> > ret = get_unused_fd_flags(O_CLOEXEC);
> > if (ret < 0)
> > diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> > index 11c13c1381cf..4b909d7b97de 100644
> > +++ b/drivers/infiniband/core/uverbs_main.c
> > @@ -965,6 +965,141 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
> > }
> > EXPORT_SYMBOL(rdma_user_mmap_io);
> >
> > +static inline u64
> > +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
> > +{
> > + return (u64)entry->mmap_page << PAGE_SHIFT;
> > +}
> > +
> > +/**
> > + * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
> > + *
> > + * @ucontext: associated user context.
> > + * @key: The key received from rdma_user_mmap_entry_insert which
> > + * is provided by user as the address to map.
> > + * @len: The length the user wants to map
> > + *
> > + * This function is called when a user tries to mmap a key it
> > + * initially received from the driver. They key was created by
> > + * the function rdma_user_mmap_entry_insert.
> > + *
> > + * Return an entry if exists or NULL if there is no match.
> > + */
> > +struct rdma_user_mmap_entry *
> > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len)
> > +{
> > + struct rdma_user_mmap_entry *entry;
> > + u64 mmap_page;
> > +
> > + mmap_page = key >> PAGE_SHIFT;
> > + if (mmap_page > U32_MAX)
> > + return NULL;
> > +
> > + entry = xa_load(&ucontext->mmap_xa, mmap_page);
> > + if (!entry || entry->length != len)
> > + return NULL;
> > +
> > + ibdev_dbg(ucontext->device,
> > + "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
> > + entry->obj, key, entry->address, entry->length);
> > +
> > + return entry;
> > +}
> > +EXPORT_SYMBOL(rdma_user_mmap_entry_get);
>
> It is a mistake we keep making, and maybe the war is hopelessly lost
> now, but functions called from a driver should not be part of the
> ib_uverbs module - ideally uverbs is an optional module. They should
> be in ib_core.
>
> Maybe put this in ib_core_uverbs.c ?
>
> Kamal, you've been tackling various cleanups, maybe making ib_uverbs
> unloadable again is something you'd be keen on?
>
Yes, Could you please give some background on that?
> > +/**
> > + * rdma_user_mmap_entry_insert() - Allocate and insert an entry to the mmap_xa.
> > + *
> > + * @ucontext: associated user context.
> > + * @obj: opaque driver object that will be stored in the entry.
> > + * @address: The address that will be mmapped to the user
> > + * @length: Length of the address that will be mmapped
> > + * @mmap_flag: opaque driver flags related to the address (For
> > + * example could be used for cachability)
> > + *
> > + * This function should be called by drivers that use the rdma_user_mmap
> > + * interface for handling user mmapped addresses. The database is handled in
> > + * the core and helper functions are provided to insert entries into the
> > + * database and extract entries when the user call mmap with the given key.
> > + * The function returns a unique key that should be provided to user, the user
> > + * will use the key to map the given address.
> > + *
> > + * Note this locking scheme cannot support removal of entries,
> > + * except during ucontext destruction when the core code
> > + * guarentees no concurrency.
> > + *
> > + * Return: unique key or RDMA_USER_MMAP_INVALID if entry was not added.
> > + */
> > +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
> > + u64 address, u64 length, u8 mmap_flag)
> > +{
> > + struct rdma_user_mmap_entry *entry;
> > + u32 next_mmap_page;
> > + int err;
> > +
> > + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> > + if (!entry)
> > + return RDMA_USER_MMAP_INVALID;
> > +
> > + entry->obj = obj;
> > + entry->address = address;
> > + entry->length = length;
> > + entry->mmap_flag = mmap_flag;
> > +
> > + xa_lock(&ucontext->mmap_xa);
> > + if (check_add_overflow(ucontext->mmap_xa_page,
> > + (u32)(length >> PAGE_SHIFT),
>
> Should this be divide round up ?
>
> > + &next_mmap_page))
> > + goto err_unlock;
>
> I still don't like that this algorithm latches into a permanent
> failure when the xa_page wraps.
>
> It seems worth spending a bit more time here to tidy this.. Keep using
> the mmap_xa_page scheme, but instead do something like
>
> alloc_cyclic_range():
>
> while () {
> // Find first empty element in a cyclic way
> xa_page_first = mmap_xa_page;
> xa_find(xa, &xa_page_first, U32_MAX, XA_FREE_MARK)
>
> // Is there a enough room to have the range?
> if (check_add_overflow(xa_page_first, npages, &xa_page_end)) {
> mmap_xa_page = 0;
> continue;
> }
>
> // See if the element before intersects
> elm = xa_find(xa, &zero, xa_page_end, 0);
> if (elm && intersects(xa_page_first, xa_page_last, elm->first, elm->last)) {
> mmap_xa_page = elm->last + 1;
> continue
> }
>
> // xa_page_first -> xa_page_end should now be free
> xa_insert(xa, xa_page_start, entry);
> mmap_xa_page = xa_page_end + 1;
> return xa_page_start;
> }
>
> Approximately, please check it.
>
> > @@ -2199,6 +2201,17 @@ struct iw_cm_conn_param;
> >
> > #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
> >
> > +#define RDMA_USER_MMAP_FLAG_SHIFT 56
> > +#define RDMA_USER_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
> > +#define RDMA_USER_MMAP_INVALID U64_MAX
> > +struct rdma_user_mmap_entry {
> > + void *obj;
> > + u64 address;
> > + u64 length;
> > + u32 mmap_page;
> > + u8 mmap_flag;
> > +};
> > +
> > /**
> > * struct ib_device_ops - InfiniBand device operations
> > * This structure defines all the InfiniBand device operations, providers will
> > @@ -2311,6 +2324,19 @@ struct ib_device_ops {
> > struct ib_udata *udata);
> > void (*dealloc_ucontext)(struct ib_ucontext *context);
> > int (*mmap)(struct ib_ucontext *context, struct vm_area_struct *vma);
> > + /**
> > + * Memory that is mapped to the user can only be freed once the
> > + * ucontext of the application is destroyed. This is for
> > + * security reasons where we don't want an application to have a
> > + * mapping to phyiscal memory that is freed and allocated to
> > + * another application. For this reason, all the entries are
> > + * stored in ucontext and once ucontext is freed mmap_free is
> > + * called on each of the entries. They type of the memory that
>
> They -> the
>
> > + * was mapped may differ between entries and is opaque to the
> > + * rdma_user_mmap interface. Therefore needs to be implemented
> > + * by the driver in mmap_free.
> > + */
> > + void (*mmap_free)(struct rdma_user_mmap_entry *entry);
> > void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
> > int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
> > void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
> > @@ -2709,6 +2735,11 @@ void ib_set_device_ops(struct ib_device *device,
> > #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
> > int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
> > unsigned long pfn, unsigned long size, pgprot_t prot);
> > +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
> > + u64 address, u64 length, u8 mmap_flag);
> > +struct rdma_user_mmap_entry *
> > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len);
> > +void rdma_user_mmap_entries_remove_free(struct ib_ucontext
> > *ucontext);
>
> Should remove_free should be in the core-priv header?
>
> Jason
^ permalink raw reply
* RE: [PATCH] net/mlx5e: Fix zero table prio set by user.
From: Paul Blakey @ 2019-07-28 10:04 UTC (permalink / raw)
To: Marcelo Ricardo Leitner, wenxu
Cc: Or Gerlitz, Saeed Mahameed, Roi Dayan, Mark Bloch,
pablo@netfilter.org, netdev@vger.kernel.org
In-Reply-To: <20190726140142.GC4063@localhost.localdomain>
On 7/26/2019 5:01 PM, Marcelo Ricardo Leitner wrote:
> On Fri, Jul 26, 2019 at 08:39:43PM +0800, wenxu wrote:
>>
>> 在 2019/7/26 20:19, Or Gerlitz 写道:
>>> On Fri, Jul 26, 2019 at 12:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
>>>> On Thu, 2019-07-25 at 19:24 +0800, wenxu@ucloud.cn wrote:
>>>>> From: wenxu <wenxu@ucloud.cn>
>>>>>
>>>>> The flow_cls_common_offload prio is zero
>>>>>
>>>>> It leads the invalid table prio in hw.
>>>>>
>>>>> Error: Could not process rule: Invalid argument
>>>>>
>>>>> kernel log:
>>>>> mlx5_core 0000:81:00.0: E-Switch: Failed to create FDB Table err -22
>>>>> (table prio: 65535, level: 0, size: 4194304)
>>>>>
>>>>> table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
>>>>> should check (chain * FDB_MAX_PRIO) + prio is not 0
>>>>>
>>>>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>>>>> ---
>>>>> drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 4 +++-
>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git
>>>>> a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>>>> index 089ae4d..64ca90f 100644
>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>>>>> @@ -970,7 +970,9 @@ static int esw_add_fdb_miss_rule(struct
>>>> this piece of code isn't in this function, weird how it got to the
>>>> diff, patch applies correctly though !
>>>>
>>>>> mlx5_eswitch *esw)
>>>>> flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
>>>>> MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
>>>>>
>>>>> - table_prio = (chain * FDB_MAX_PRIO) + prio - 1;
>>>>> + table_prio = (chain * FDB_MAX_PRIO) + prio;
>>>>> + if (table_prio)
>>>>> + table_prio = table_prio - 1;
>>>>>
>>>> This is black magic, even before this fix.
>>>> this -1 seems to be needed in order to call
>>>> create_next_size_table(table_prio) with the previous "table prio" ?
>>>> (table_prio - 1) ?
>>>>
>>>> The whole thing looks wrong to me since when prio is 0 and chain is 0,
>>>> there is not such thing table_prio - 1.
>>>>
>>>> mlnx eswitch guys in the cc, please advise.
>>> basically, prio 0 is not something we ever get in the driver, since if
>>> user space
>>> specifies 0, the kernel generates some random non-zero prio, and we support
>>> only prios 1-16 -- Wenxu -- what do you run to get this error?
>>>
>>>
>> I run offload with nfatbles(but not tc), there is no prio for each rule.
>>
>> prio of flow_cls_common_offload init as 0.
>>
>> static void nft_flow_offload_common_init(struct flow_cls_common_offload *common,
>>
>> __be16 proto,
>> struct netlink_ext_ack *extack)
>> {
>> common->protocol = proto;
>> common->extack = extack;
>> }
>>
>>
>> flow_cls_common_offload
>
> Note that on
> [PATCH net-next] netfilter: nf_table_offload: Fix zero prio of flow_cls_common_offload
> I asked Pablo on how nftables should behave on this situation.
>
> It's the same issue as in the patch above but being fixed at a
> different level.
That's better, since the original code relied on not having prio 0 as valid, the suggested fix (net/mlx5e: Fix zero table prio set by user) maps NFT offload prio 0 and tc prio 1 to the same
hardware table. This is wrong and can cause issues.
^ permalink raw reply
* Re: next-20190723: bpf/seccomp - systemd/journald issue?
From: Sedat Dilek @ 2019-07-28 11:09 UTC (permalink / raw)
To: Yonghong Song
Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
Martin Lau, Song Liu, netdev@vger.kernel.org, bpf@vger.kernel.org,
Clang-Built-Linux ML, Kees Cook, Nick Desaulniers,
Nathan Chancellor
In-Reply-To: <934a2a0a-c3fb-fd75-b8a3-c1042d73ca0c@fb.com>
On Sat, Jul 27, 2019 at 7:08 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/27/19 12:36 AM, Sedat Dilek wrote:
> > On Sat, Jul 27, 2019 at 4:24 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >>
> >> On Fri, Jul 26, 2019 at 2:19 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >>>
> >>> On Fri, Jul 26, 2019 at 11:10 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 7/26/19 2:02 PM, Sedat Dilek wrote:
> >>>>> On Fri, Jul 26, 2019 at 10:38 PM Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >>>>>>
> >>>>>> Hi Yonghong Song,
> >>>>>>
> >>>>>> On Fri, Jul 26, 2019 at 5:45 PM Yonghong Song <yhs@fb.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 7/26/19 1:26 AM, Sedat Dilek wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I have opened a new issue in the ClangBuiltLinux issue tracker.
> >>>>>>>
> >>>>>>> Glad to know clang 9 has asm goto support and now It can compile
> >>>>>>> kernel again.
> >>>>>>>
> >>>>>>
> >>>>>> Yupp.
> >>>>>>
> >>>>>>>>
> >>>>>>>> I am seeing a problem in the area bpf/seccomp causing
> >>>>>>>> systemd/journald/udevd services to fail.
> >>>>>>>>
> >>>>>>>> [Fri Jul 26 08:08:43 2019] systemd[453]: systemd-udevd.service: Failed
> >>>>>>>> to connect stdout to the journal socket, ignoring: Connection refused
> >>>>>>>>
> >>>>>>>> This happens when I use the (LLVM) LLD ld.lld-9 linker but not with
> >>>>>>>> BFD linker ld.bfd on Debian/buster AMD64.
> >>>>>>>> In both cases I use clang-9 (prerelease).
> >>>>>>>
> >>>>>>> Looks like it is a lld bug.
> >>>>>>>
> >>>>>>> I see the stack trace has __bpf_prog_run32() which is used by
> >>>>>>> kernel bpf interpreter. Could you try to enable bpf jit
> >>>>>>> sysctl net.core.bpf_jit_enable = 1
> >>>>>>> If this passed, it will prove it is interpreter related.
> >>>>>>>
> >>>>>>
> >>>>>> After...
> >>>>>>
> >>>>>> sysctl -w net.core.bpf_jit_enable=1
> >>>>>>
> >>>>>> I can start all failed systemd services.
> >>>>>>
> >>>>>> systemd-journald.service
> >>>>>> systemd-udevd.service
> >>>>>> haveged.service
> >>>>>>
> >>>>>> This is in maintenance mode.
> >>>>>>
> >>>>>> What is next: Do set a permanent sysctl setting for net.core.bpf_jit_enable?
> >>>>>>
> >>>>>
> >>>>> This is what I did:
> >>>>
> >>>> I probably won't have cycles to debug this potential lld issue.
> >>>> Maybe you already did, I suggest you put enough reproducible
> >>>> details in the bug you filed against lld so they can take a look.
> >>>>
> >>>
> >>> I understand and will put the journalctl-log into the CBL issue
> >>> tracker and update informations.
> >>>
> >>> Thanks for your help understanding the BPF correlations.
> >>>
> >>> Is setting 'net.core.bpf_jit_enable = 2' helpful here?
> >>
> >> jit_enable=1 is enough.
> >> Or use CONFIG_BPF_JIT_ALWAYS_ON to workaround.
> >>
> >> It sounds like clang miscompiles interpreter.
> >> modprobe test_bpf
> >> should be able to point out which part of interpreter is broken.
> >
> > Maybe we need something like...
> >
> > "bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()"
> >
> > ...for clang?
>
> Not sure how do you get conclusion it is gcse causing the problem.
> But anyway, adding such flag in the kernel is not a good idea.
> clang/llvm should be fixed instead. Esp. there is still time
> for 9.0.0 release to fix bugs.
>
To clarify: This is a snapshot release of clang-9 built with tc-build.
Building with -O0 is not possible as I see asm-goto failing.
- Sedat -
[1] https://github.com/ClangBuiltLinux/tc-build
> >
> > - Sedat -
> >
> > [1] https://git.kernel.org/linus/3193c0836f203a91bef96d88c64cccf0be090d9c
> >
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox