Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] ipv4/route: do not check saddr dev if iif is LOOPBACK_IFINDEX
From: Stefano Brivio @ 2019-08-02 20:35 UTC (permalink / raw)
  To: David Ahern
  Cc: Hangbin Liu, netdev, Marcelo Ricardo Leitner, David S . Miller
In-Reply-To: <f44d9f26-046d-38a2-13aa-d25b92419d11@gmail.com>

David,

On Thu, 1 Aug 2019 13:51:25 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 8/1/19 2:29 AM, Hangbin Liu wrote:
> > Jianlin reported a bug that for IPv4, ip route get from src_addr would fail
> > if src_addr is not an address on local system.
> > 
> > \# ip route get 1.1.1.1 from 2.2.2.2
> > RTNETLINK answers: Invalid argument  
> 
> so this is a forwarding lookup in which case iif should be set.

On actual forwarding, yes, it will be set.

But if we are just doing a lookup for a route (iif is
LOOPBACK_IFINDEX), I think this should still give us the matching route,
which is what IPv6 already does and what this patch fixes for IPv4.

Otherwise, we have no way to fetch that route, no matter if source
routing is configured. So I think this patch is correct and to some
extent necessary.

-- 
Stefano

^ permalink raw reply

* [PATCH] staging: isdn: remove unnecessary parentheses
From: Thiago Bonotto @ 2019-08-02 20:23 UTC (permalink / raw)
  To: Karsten Keil, Greg Kroah-Hartman, netdev, devel, linux-kernel,
	lkcamp

Fix the following checkpatch error:

ERROR: return is not a function, parentheses are not required
FILE: drivers/staging/isdn/hysdn/hysdn_net.c:289:
+        return (0);                /* and return success */

Signed-off-by: Thiago Bonotto <thbonotto@gmail.com>
---
Hello, this is my first contribution :)
Thanks for reviewing 

 drivers/staging/isdn/hysdn/hysdn_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/isdn/hysdn/hysdn_net.c b/drivers/staging/isdn/hysdn/hysdn_net.c
index bea37ae30..dcb9ef7a2 100644
--- a/drivers/staging/isdn/hysdn/hysdn_net.c
+++ b/drivers/staging/isdn/hysdn/hysdn_net.c
@@ -286,7 +286,7 @@ hysdn_net_create(hysdn_card *card)
 
 	if (card->debug_flags & LOG_NET_INIT)
 		hysdn_addlog(card, "network device created");
-	return (0);		/* and return success */
+	return 0;		/* and return success */
 }				/* hysdn_net_create */
 
 /***************************************************************************/
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH bpf-next 0/3] selftests/bpf: switch test_progs back to stdio
From: Stanislav Fomichev @ 2019-08-02 20:16 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stanislav Fomichev, netdev@vger.kernel.org, bpf@vger.kernel.org,
	davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
	andrii.nakryiko@gmail.com
In-Reply-To: <c79d3a8c-4986-a321-7b68-5273be7c2be7@fb.com>

On 08/02, Andrii Nakryiko wrote:
> 
> On 8/2/19 10:17 AM, Stanislav Fomichev wrote:
> > I was looking into converting test_sockops* to test_progs framework
> > and that requires using cgroup_helpers.c which rely on stdio/stderr.
> > Let's use open_memstream to override stdout into buffer during
> > subtests instead of custom test_{v,}printf wrappers. That lets
> > us continue to use stdio in the subtests and dump it on failure
> > if required.
> >
> > That would also fix bpf_find_map which currently uses printf to
> > signal failure (missed during test_printf conversion).
> I wonder if we should hijack stderr as well?
I was planning to do it when I add cgroup_helpers support because they
log to stderr. Wanted to keep the changes minimal. But let's do
them all in this series while we are it.

> > Cc: Andrii Nakryiko <andriin@fb.com>
> >
> > Stanislav Fomichev (3):
> >   selftests/bpf: test_progs: switch to open_memstream
> >   selftests/bpf: test_progs: test__printf -> printf
> >   selftests/bpf: test_progs: drop extra trailing tab
> >
> >  .../bpf/prog_tests/bpf_verif_scale.c          |   4 +-
> >  .../selftests/bpf/prog_tests/l4lb_all.c       |   2 +-
> >  .../selftests/bpf/prog_tests/map_lock.c       |  10 +-
> >  .../selftests/bpf/prog_tests/send_signal.c    |   4 +-
> >  .../selftests/bpf/prog_tests/spinlock.c       |   2 +-
> >  .../bpf/prog_tests/stacktrace_build_id.c      |   4 +-
> >  .../bpf/prog_tests/stacktrace_build_id_nmi.c  |   4 +-
> >  .../selftests/bpf/prog_tests/xdp_noinline.c   |   4 +-
> >  tools/testing/selftests/bpf/test_progs.c      | 116 +++++++-----------
> >  tools/testing/selftests/bpf/test_progs.h      |  12 +-
> >  10 files changed, 68 insertions(+), 94 deletions(-)
> >

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] selftests/bpf: test_progs: switch to open_memstream
From: Stanislav Fomichev @ 2019-08-02 20:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stanislav Fomichev, netdev@vger.kernel.org, bpf@vger.kernel.org,
	davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
	andrii.nakryiko@gmail.com
In-Reply-To: <80957794-de90-b09b-89ef-6094d6357d9e@fb.com>

On 08/02, Andrii Nakryiko wrote:
> On 8/2/19 10:17 AM, Stanislav Fomichev wrote:
> > Use open_memstream to override stdout during test execution.
> > The copy of the original stdout is held in env.stdout and used
> > to print subtest info and dump failed log.
> 
> I really like the idea. I didn't know about open_memstream, it's awesome. Thanks!
One possible downside of using open_memstream is that it's glibc
specific. I probably need to wrap it in #ifdef __GLIBC__ to make
it work with other libcs and just print everything as it was before :-(.
I'm not sure we care though.

> > test_{v,}printf are now simple wrappers around stdout and will be
> > removed in the next patch.
> >
> > Cc: Andrii Nakryiko <andriin@fb.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  tools/testing/selftests/bpf/test_progs.c | 100 ++++++++++-------------
> >  tools/testing/selftests/bpf/test_progs.h |   2 +-
> >  2 files changed, 46 insertions(+), 56 deletions(-)
> >
> > diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
> > index db00196c8315..00d1565d01a3 100644
> > --- a/tools/testing/selftests/bpf/test_progs.c
> > +++ b/tools/testing/selftests/bpf/test_progs.c
> > @@ -40,14 +40,22 @@ static bool should_run(struct test_selector *sel, int num, const char *name)
> >  
> >  static void dump_test_log(const struct prog_test_def *test, bool failed)
> >  {
> > -	if (env.verbose || test->force_log || failed) {
> > -		if (env.log_cnt) {
> > -			fprintf(stdout, "%s", env.log_buf);
> > -			if (env.log_buf[env.log_cnt - 1] != '\n')
> > -				fprintf(stdout, "\n");
> > +	if (stdout == env.stdout)
> > +		return;
> > +
> > +	fflush(stdout); /* exports env.log_buf & env.log_cap */
> > +
> > +	if (env.log_cap && (env.verbose || test->force_log || failed)) {
> > +		int len = strlen(env.log_buf);
> 
> env.log_cap is not really a capacity, it's actual number of bytes (without terminating zero), so there is no need to do strlen and it's probably better to rename env.log_cap into env.log_cnt.
I'll rename it to log_size to match open_memstream args.
We probably still need to do strlen because open_memstream can allocate
bigger buffer to hold the data.

> > +
> > +		if (len) {
> > +			fprintf(env.stdout, "%s", env.log_buf);
> > +			if (env.log_buf[len - 1] != '\n')
> > +				fprintf(env.stdout, "\n");
> > +
> > +			fseeko(stdout, 0, SEEK_SET);
> Same bug as I already fixed with env.log_cnt = 0 being inside this if. You want to do seek always, not just when you print output log.
SG, will move to where we currently clear log_cnt, thanks!

> >  /* rewind */
> >  		}
> >  	}
> > -	env.log_cnt = 0;
> >  }
> >  
> >  void test__end_subtest()
> > @@ -62,7 +70,7 @@ void test__end_subtest()
> >  
> >  	dump_test_log(test, sub_error_cnt);
> >  
> > -	printf("#%d/%d %s:%s\n",
> > +	fprintf(env.stdout, "#%d/%d %s:%s\n",
> >  	       test->test_num, test->subtest_num,
> >  	       test->subtest_name, sub_error_cnt ? "FAIL" : "OK");
> >  }
> > @@ -100,53 +108,7 @@ void test__force_log() {
> >  
> >  void test__vprintf(const char *fmt, va_list args)
> >  {
> > -	size_t rem_sz;
> > -	int ret = 0;
> > -
> > -	if (env.verbose || (env.test && env.test->force_log)) {
> > -		vfprintf(stderr, fmt, args);
> > -		return;
> > -	}
> > -
> > -try_again:
> > -	rem_sz = env.log_cap - env.log_cnt;
> > -	if (rem_sz) {
> > -		va_list ap;
> > -
> > -		va_copy(ap, args);
> > -		/* we reserved extra byte for \0 at the end */
> > -		ret = vsnprintf(env.log_buf + env.log_cnt, rem_sz + 1, fmt, ap);
> > -		va_end(ap);
> > -
> > -		if (ret < 0) {
> > -			env.log_buf[env.log_cnt] = '\0';
> > -			fprintf(stderr, "failed to log w/ fmt '%s'\n", fmt);
> > -			return;
> > -		}
> > -	}
> > -
> > -	if (!rem_sz || ret > rem_sz) {
> > -		size_t new_sz = env.log_cap * 3 / 2;
> > -		char *new_buf;
> > -
> > -		if (new_sz < 4096)
> > -			new_sz = 4096;
> > -		if (new_sz < ret + env.log_cnt)
> > -			new_sz = ret + env.log_cnt;
> > -
> > -		/* +1 for guaranteed space for terminating \0 */
> > -		new_buf = realloc(env.log_buf, new_sz + 1);
> > -		if (!new_buf) {
> > -			fprintf(stderr, "failed to realloc log buffer: %d\n",
> > -				errno);
> > -			return;
> > -		}
> > -		env.log_buf = new_buf;
> > -		env.log_cap = new_sz;
> > -		goto try_again;
> > -	}
> > -
> > -	env.log_cnt += ret;
> > +	vprintf(fmt, args);
> >  }
> >  
> >  void test__printf(const char *fmt, ...)
> > @@ -477,6 +439,32 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state)
> >  	return 0;
> >  }
> >  
> > +static void stdout_hijack(void)
> > +{
> > +	if (env.verbose || (env.test && env.test->force_log)) {
> > +		/* nothing to do, output to stdout by default */
> > +		return;
> > +	}
> > +
> > +	/* stdout -> buffer */
> > +	fflush(stdout);
> > +	stdout = open_memstream(&env.log_buf, &env.log_cap);
> Check errors and restore original stdout if something went wrong? (And emit some warning to stderr).
Good point, will do.

> > +}
> > +
> > +static void stdout_restore(void)
> > +{
> > +	if (stdout == env.stdout)
> > +		return;
> > +
> > +	fclose(stdout);
> > +	free(env.log_buf);
> > +
> > +	env.log_buf = NULL;
> > +	env.log_cap = 0;
> > +
> > +	stdout = env.stdout;
> > +}
> > +
> >  int main(int argc, char **argv)
> >  {
> >  	static const struct argp argp = {
> > @@ -495,6 +483,7 @@ int main(int argc, char **argv)
> >  	srand(time(NULL));
> >  
> >  	env.jit_enabled = is_jit_enabled();
> > +	env.stdout = stdout;
> >  
> >  	for (i = 0; i < prog_test_cnt; i++) {
> >  		struct prog_test_def *test = &prog_test_defs[i];
> > @@ -508,6 +497,7 @@ int main(int argc, char **argv)
> >  				test->test_num, test->test_name))
> >  			continue;
> >  
> > +		stdout_hijack();
> Why do you do this for every test? Just do once before all the tests and restore after?
We can do that, my thinking was to limit the area of hijacking :-)
But that would work as well, less allocations per test, I guess. Will
do.

> >  		test->run_test();
> >  		/* ensure last sub-test is finalized properly */
> >  		if (test->subtest_name)
> > @@ -522,6 +512,7 @@ int main(int argc, char **argv)
> >  			env.succ_cnt++;
> >  
> >  		dump_test_log(test, test->error_cnt);
> > +		stdout_restore();
> >  
> >  		printf("#%d %s:%s\n", test->test_num, test->test_name,
> >  		       test->error_cnt ? "FAIL" : "OK");
> > @@ -529,7 +520,6 @@ int main(int argc, char **argv)
> >  	printf("Summary: %d/%d PASSED, %d FAILED\n",
> >  	       env.succ_cnt, env.sub_succ_cnt, env.fail_cnt);
> >  
> > -	free(env.log_buf);
> >  	free(env.test_selector.num_set);
> >  	free(env.subtest_selector.num_set);
> >  
> > diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
> > index afd14962456f..9fd89078494f 100644
> > --- a/tools/testing/selftests/bpf/test_progs.h
> > +++ b/tools/testing/selftests/bpf/test_progs.h
> > @@ -56,8 +56,8 @@ struct test_env {
> >  	bool jit_enabled;
> >  
> >  	struct prog_test_def *test;
> > +	FILE *stdout;
> >  	char *log_buf;
> > -	size_t log_cnt;
> >  	size_t log_cap;
> So it's actually log_cnt that's assigned on fflush for memstream, according to man page, so probably keep log_cnt, delete log_cap.
Ack. See above, will rename to log_size, let me know if you disagree.

> >  	int succ_cnt; /* successful tests */

^ permalink raw reply

* Re: [PATCH bpf-next 0/3] selftests/bpf: switch test_progs back to stdio
From: Andrii Nakryiko @ 2019-08-02 20:00 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev@vger.kernel.org, bpf@vger.kernel.org
  Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
	andrii.nakryiko@gmail.com
In-Reply-To: <20190802171710.11456-1-sdf@google.com>


On 8/2/19 10:17 AM, Stanislav Fomichev wrote:
> I was looking into converting test_sockops* to test_progs framework
> and that requires using cgroup_helpers.c which rely on stdio/stderr.
> Let's use open_memstream to override stdout into buffer during
> subtests instead of custom test_{v,}printf wrappers. That lets
> us continue to use stdio in the subtests and dump it on failure
> if required.
>
> That would also fix bpf_find_map which currently uses printf to
> signal failure (missed during test_printf conversion).
I wonder if we should hijack stderr as well?
>
> Cc: Andrii Nakryiko <andriin@fb.com>
>
> Stanislav Fomichev (3):
>   selftests/bpf: test_progs: switch to open_memstream
>   selftests/bpf: test_progs: test__printf -> printf
>   selftests/bpf: test_progs: drop extra trailing tab
>
>  .../bpf/prog_tests/bpf_verif_scale.c          |   4 +-
>  .../selftests/bpf/prog_tests/l4lb_all.c       |   2 +-
>  .../selftests/bpf/prog_tests/map_lock.c       |  10 +-
>  .../selftests/bpf/prog_tests/send_signal.c    |   4 +-
>  .../selftests/bpf/prog_tests/spinlock.c       |   2 +-
>  .../bpf/prog_tests/stacktrace_build_id.c      |   4 +-
>  .../bpf/prog_tests/stacktrace_build_id_nmi.c  |   4 +-
>  .../selftests/bpf/prog_tests/xdp_noinline.c   |   4 +-
>  tools/testing/selftests/bpf/test_progs.c      | 116 +++++++-----------
>  tools/testing/selftests/bpf/test_progs.h      |  12 +-
>  10 files changed, 68 insertions(+), 94 deletions(-)
>

^ permalink raw reply

* Re: [PATCH bpf-next 3/3] selftests/bpf: test_progs: drop extra trailing tab
From: Andrii Nakryiko @ 2019-08-02 19:59 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev@vger.kernel.org, bpf@vger.kernel.org
  Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net
In-Reply-To: <20190802171710.11456-4-sdf@google.com>


On 8/2/19 10:17 AM, Stanislav Fomichev wrote:
> Small (un)related cleanup.
>
> Cc: Andrii Nakryiko <andriin@fb.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
Acked-by: Andrii Nakryiko <andriin@fb.com>
>  tools/testing/selftests/bpf/test_progs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
> index 71c717162ac8..477539d0adeb 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -279,7 +279,7 @@ enum ARG_KEYS {
>  	ARG_VERIFIER_STATS = 's',
>  	ARG_VERBOSE = 'v',
>  };
> -	
> +
>  static const struct argp_option opts[] = {
>  	{ "num", ARG_TEST_NUM, "NUM", 0,
>  	  "Run test number NUM only " },

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] selftests/bpf: test_progs: test__printf -> printf
From: Andrii Nakryiko @ 2019-08-02 19:59 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev@vger.kernel.org, bpf@vger.kernel.org
  Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
	andrii.nakryiko@gmail.com
In-Reply-To: <20190802171710.11456-3-sdf@google.com>


On 8/2/19 10:17 AM, Stanislav Fomichev wrote:
> Now that test__printf is a simple wraper around printf, let's drop it
> (and test__vprintf as well).
>
> Cc: Andrii Nakryiko <andriin@fb.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---


Acked-by: Andrii Nakryiko <andriin@fb.com>


>  .../selftests/bpf/prog_tests/bpf_verif_scale.c   |  4 ++--
>  .../testing/selftests/bpf/prog_tests/l4lb_all.c  |  2 +-
>  .../testing/selftests/bpf/prog_tests/map_lock.c  | 10 +++++-----
>  .../selftests/bpf/prog_tests/send_signal.c       |  4 ++--
>  .../testing/selftests/bpf/prog_tests/spinlock.c  |  2 +-
>  .../bpf/prog_tests/stacktrace_build_id.c         |  4 ++--
>  .../bpf/prog_tests/stacktrace_build_id_nmi.c     |  4 ++--
>  .../selftests/bpf/prog_tests/xdp_noinline.c      |  4 ++--
>  tools/testing/selftests/bpf/test_progs.c         | 16 +---------------
>  tools/testing/selftests/bpf/test_progs.h         | 10 ++++------
>  10 files changed, 22 insertions(+), 38 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> index b4be96162ff4..3548ba2f24a8 100644
> --- a/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> +++ b/tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c
> @@ -5,13 +5,13 @@ static int libbpf_debug_print(enum libbpf_print_level level,
>  			      const char *format, va_list args)
>  {
>  	if (level != LIBBPF_DEBUG) {
> -		test__vprintf(format, args);
> +		vprintf(format, args);
>  		return 0;
>  	}
>  
>  	if (!strstr(format, "verifier log"))
>  		return 0;
> -	test__vprintf("%s", args);
> +	vprintf("%s", args);
>  	return 0;
>  }
>  
> diff --git a/tools/testing/selftests/bpf/prog_tests/l4lb_all.c b/tools/testing/selftests/bpf/prog_tests/l4lb_all.c
> index 5ce572c03a5f..20ddca830e68 100644
> --- a/tools/testing/selftests/bpf/prog_tests/l4lb_all.c
> +++ b/tools/testing/selftests/bpf/prog_tests/l4lb_all.c
> @@ -74,7 +74,7 @@ static void test_l4lb(const char *file)
>  	}
>  	if (bytes != MAGIC_BYTES * NUM_ITER * 2 || pkts != NUM_ITER * 2) {
>  		error_cnt++;
> -		test__printf("test_l4lb:FAIL:stats %lld %lld\n", bytes, pkts);
> +		printf("test_l4lb:FAIL:stats %lld %lld\n", bytes, pkts);
>  	}
>  out:
>  	bpf_object__close(obj);
> diff --git a/tools/testing/selftests/bpf/prog_tests/map_lock.c b/tools/testing/selftests/bpf/prog_tests/map_lock.c
> index 2e78217ed3fd..ee99368c595c 100644
> --- a/tools/testing/selftests/bpf/prog_tests/map_lock.c
> +++ b/tools/testing/selftests/bpf/prog_tests/map_lock.c
> @@ -9,12 +9,12 @@ static void *parallel_map_access(void *arg)
>  	for (i = 0; i < 10000; i++) {
>  		err = bpf_map_lookup_elem_flags(map_fd, &key, vars, BPF_F_LOCK);
>  		if (err) {
> -			test__printf("lookup failed\n");
> +			printf("lookup failed\n");
>  			error_cnt++;
>  			goto out;
>  		}
>  		if (vars[0] != 0) {
> -			test__printf("lookup #%d var[0]=%d\n", i, vars[0]);
> +			printf("lookup #%d var[0]=%d\n", i, vars[0]);
>  			error_cnt++;
>  			goto out;
>  		}
> @@ -22,8 +22,8 @@ static void *parallel_map_access(void *arg)
>  		for (j = 2; j < 17; j++) {
>  			if (vars[j] == rnd)
>  				continue;
> -			test__printf("lookup #%d var[1]=%d var[%d]=%d\n",
> -				     i, rnd, j, vars[j]);
> +			printf("lookup #%d var[1]=%d var[%d]=%d\n",
> +			       i, rnd, j, vars[j]);
>  			error_cnt++;
>  			goto out;
>  		}
> @@ -43,7 +43,7 @@ void test_map_lock(void)
>  
>  	err = bpf_prog_load(file, BPF_PROG_TYPE_CGROUP_SKB, &obj, &prog_fd);
>  	if (err) {
> -		test__printf("test_map_lock:bpf_prog_load errno %d\n", errno);
> +		printf("test_map_lock:bpf_prog_load errno %d\n", errno);
>  		goto close_prog;
>  	}
>  	map_fd[0] = bpf_find_map(__func__, obj, "hash_map");
> diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> index 461b423d0584..1575f0a1f586 100644
> --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
> +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
> @@ -202,8 +202,8 @@ static int test_send_signal_nmi(void)
>  			 -1 /* cpu */, -1 /* group_fd */, 0 /* flags */);
>  	if (pmu_fd == -1) {
>  		if (errno == ENOENT) {
> -			test__printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n",
> -				     __func__);
> +			printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n",
> +			       __func__);
>  			return 0;
>  		}
>  		/* Let the test fail with a more informative message */
> diff --git a/tools/testing/selftests/bpf/prog_tests/spinlock.c b/tools/testing/selftests/bpf/prog_tests/spinlock.c
> index deb2db5b85b0..114ebe6a438e 100644
> --- a/tools/testing/selftests/bpf/prog_tests/spinlock.c
> +++ b/tools/testing/selftests/bpf/prog_tests/spinlock.c
> @@ -12,7 +12,7 @@ void test_spinlock(void)
>  
>  	err = bpf_prog_load(file, BPF_PROG_TYPE_CGROUP_SKB, &obj, &prog_fd);
>  	if (err) {
> -		test__printf("test_spin_lock:bpf_prog_load errno %d\n", errno);
> +		printf("test_spin_lock:bpf_prog_load errno %d\n", errno);
>  		goto close_prog;
>  	}
>  	for (i = 0; i < 4; i++)
> diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
> index 356d2c017a9c..ac44fda84833 100644
> --- a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
> +++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
> @@ -109,8 +109,8 @@ void test_stacktrace_build_id(void)
>  	if (build_id_matches < 1 && retry--) {
>  		bpf_link__destroy(link);
>  		bpf_object__close(obj);
> -		test__printf("%s:WARN:Didn't find expected build ID from the map, retrying\n",
> -			     __func__);
> +		printf("%s:WARN:Didn't find expected build ID from the map, retrying\n",
> +		       __func__);
>  		goto retry;
>  	}
>  
> diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
> index f44f2c159714..9557b7dfb782 100644
> --- a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
> +++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
> @@ -140,8 +140,8 @@ void test_stacktrace_build_id_nmi(void)
>  	if (build_id_matches < 1 && retry--) {
>  		bpf_link__destroy(link);
>  		bpf_object__close(obj);
> -		test__printf("%s:WARN:Didn't find expected build ID from the map, retrying\n",
> -			     __func__);
> +		printf("%s:WARN:Didn't find expected build ID from the map, retrying\n",
> +		       __func__);
>  		goto retry;
>  	}
>  
> diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c b/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c
> index b5404494b8aa..15f7c272edb0 100644
> --- a/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c
> +++ b/tools/testing/selftests/bpf/prog_tests/xdp_noinline.c
> @@ -75,8 +75,8 @@ void test_xdp_noinline(void)
>  	}
>  	if (bytes != MAGIC_BYTES * NUM_ITER * 2 || pkts != NUM_ITER * 2) {
>  		error_cnt++;
> -		test__printf("test_xdp_noinline:FAIL:stats %lld %lld\n",
> -			     bytes, pkts);
> +		printf("test_xdp_noinline:FAIL:stats %lld %lld\n",
> +		       bytes, pkts);
>  	}
>  out:
>  	bpf_object__close(obj);
> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
> index 00d1565d01a3..71c717162ac8 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -106,20 +106,6 @@ void test__force_log() {
>  	env.test->force_log = true;
>  }
>  
> -void test__vprintf(const char *fmt, va_list args)
> -{
> -	vprintf(fmt, args);
> -}
> -
> -void test__printf(const char *fmt, ...)
> -{
> -	va_list args;
> -
> -	va_start(args, fmt);
> -	test__vprintf(fmt, args);
> -	va_end(args);
> -}
> -
>  struct ipv4_packet pkt_v4 = {
>  	.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
>  	.iph.ihl = 5,
> @@ -311,7 +297,7 @@ static int libbpf_print_fn(enum libbpf_print_level level,
>  {
>  	if (!env.very_verbose && level == LIBBPF_DEBUG)
>  		return 0;
> -	test__vprintf(format, args);
> +	vprintf(format, args);
>  	return 0;
>  }
>  
> diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
> index 9fd89078494f..cf0486dbb9b4 100644
> --- a/tools/testing/selftests/bpf/test_progs.h
> +++ b/tools/testing/selftests/bpf/test_progs.h
> @@ -69,8 +69,6 @@ extern int error_cnt;
>  extern int pass_cnt;
>  extern struct test_env env;
>  
> -extern void test__printf(const char *fmt, ...);
> -extern void test__vprintf(const char *fmt, va_list args);
>  extern void test__force_log();
>  extern bool test__start_subtest(const char *name);
>  
> @@ -96,12 +94,12 @@ extern struct ipv6_packet pkt_v6;
>  	int __ret = !!(condition);					\
>  	if (__ret) {							\
>  		error_cnt++;						\
> -		test__printf("%s:FAIL:%s ", __func__, tag);		\
> -		test__printf(format);					\
> +		printf("%s:FAIL:%s ", __func__, tag);			\
> +		printf(format);						\
>  	} else {							\
>  		pass_cnt++;						\
> -		test__printf("%s:PASS:%s %d nsec\n",			\
> -			      __func__, tag, duration);			\
> +		printf("%s:PASS:%s %d nsec\n",				\
> +		       __func__, tag, duration);			\
>  	}								\
>  	__ret;								\
>  })

^ permalink raw reply

* Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits
From: Bernd @ 2019-08-02 19:58 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: netdev, edumazet
In-Reply-To: <CADVnQy=dvmksVaDu61+w-qtv2g_iNbWPFgbSJDtx9QaasmHonw@mail.gmail.com>

2019-08-02 21:14 GMT+02:00, Neal Cardwell <ncardwell@google.com>:
> What's the exact kernel version you are using?

It is the RHEL errata kernel 3.10.0-957.21.3.el7 (rhsa-2019:1481), i
need to check if there is a newer one.

> Eric submitted a patch recently that may address your issue:
>    tcp: be more careful in tcp_fragment()
>
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=b617158dc096709d8600c53b6052144d12b89fab
>
> Would you be able to test your workload with that commit
> cherry-picked, and see if the issue still occurs?

It only happens on a customer system in production up to now, so most
likely not.

> That commit was targeted to many stable releases, so you may be able
> to pick up that fix from a stable branch.

The only thing which is a bit strange, this is a Java client, and I am
pretty sure we don’t set a small SO_SNDBUF, if anything it is
increased (I need to verify that).

Not to worry, I guess I can now with your helpful pointer sort that
out with Redhat.

gruss
Bernd

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] selftests/bpf: test_progs: switch to open_memstream
From: Andrii Nakryiko @ 2019-08-02 19:57 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev@vger.kernel.org, bpf@vger.kernel.org
  Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
	andrii.nakryiko@gmail.com
In-Reply-To: <20190802171710.11456-2-sdf@google.com>

On 8/2/19 10:17 AM, Stanislav Fomichev wrote:
> Use open_memstream to override stdout during test execution.
> The copy of the original stdout is held in env.stdout and used
> to print subtest info and dump failed log.

I really like the idea. I didn't know about open_memstream, it's awesome. Thanks!

>
> test_{v,}printf are now simple wrappers around stdout and will be
> removed in the next patch.
>
> Cc: Andrii Nakryiko <andriin@fb.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  tools/testing/selftests/bpf/test_progs.c | 100 ++++++++++-------------
>  tools/testing/selftests/bpf/test_progs.h |   2 +-
>  2 files changed, 46 insertions(+), 56 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
> index db00196c8315..00d1565d01a3 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -40,14 +40,22 @@ static bool should_run(struct test_selector *sel, int num, const char *name)
>  
>  static void dump_test_log(const struct prog_test_def *test, bool failed)
>  {
> -	if (env.verbose || test->force_log || failed) {
> -		if (env.log_cnt) {
> -			fprintf(stdout, "%s", env.log_buf);
> -			if (env.log_buf[env.log_cnt - 1] != '\n')
> -				fprintf(stdout, "\n");
> +	if (stdout == env.stdout)
> +		return;
> +
> +	fflush(stdout); /* exports env.log_buf & env.log_cap */
> +
> +	if (env.log_cap && (env.verbose || test->force_log || failed)) {
> +		int len = strlen(env.log_buf);

env.log_cap is not really a capacity, it's actual number of bytes (without terminating zero), so there is no need to do strlen and it's probably better to rename env.log_cap into env.log_cnt.


> +
> +		if (len) {
> +			fprintf(env.stdout, "%s", env.log_buf);
> +			if (env.log_buf[len - 1] != '\n')
> +				fprintf(env.stdout, "\n");
> +
> +			fseeko(stdout, 0, SEEK_SET);
Same bug as I already fixed with env.log_cnt = 0 being inside this if. You want to do seek always, not just when you print output log.
>  /* rewind */
>  		}
>  	}
> -	env.log_cnt = 0;
>  }
>  
>  void test__end_subtest()
> @@ -62,7 +70,7 @@ void test__end_subtest()
>  
>  	dump_test_log(test, sub_error_cnt);
>  
> -	printf("#%d/%d %s:%s\n",
> +	fprintf(env.stdout, "#%d/%d %s:%s\n",
>  	       test->test_num, test->subtest_num,
>  	       test->subtest_name, sub_error_cnt ? "FAIL" : "OK");
>  }
> @@ -100,53 +108,7 @@ void test__force_log() {
>  
>  void test__vprintf(const char *fmt, va_list args)
>  {
> -	size_t rem_sz;
> -	int ret = 0;
> -
> -	if (env.verbose || (env.test && env.test->force_log)) {
> -		vfprintf(stderr, fmt, args);
> -		return;
> -	}
> -
> -try_again:
> -	rem_sz = env.log_cap - env.log_cnt;
> -	if (rem_sz) {
> -		va_list ap;
> -
> -		va_copy(ap, args);
> -		/* we reserved extra byte for \0 at the end */
> -		ret = vsnprintf(env.log_buf + env.log_cnt, rem_sz + 1, fmt, ap);
> -		va_end(ap);
> -
> -		if (ret < 0) {
> -			env.log_buf[env.log_cnt] = '\0';
> -			fprintf(stderr, "failed to log w/ fmt '%s'\n", fmt);
> -			return;
> -		}
> -	}
> -
> -	if (!rem_sz || ret > rem_sz) {
> -		size_t new_sz = env.log_cap * 3 / 2;
> -		char *new_buf;
> -
> -		if (new_sz < 4096)
> -			new_sz = 4096;
> -		if (new_sz < ret + env.log_cnt)
> -			new_sz = ret + env.log_cnt;
> -
> -		/* +1 for guaranteed space for terminating \0 */
> -		new_buf = realloc(env.log_buf, new_sz + 1);
> -		if (!new_buf) {
> -			fprintf(stderr, "failed to realloc log buffer: %d\n",
> -				errno);
> -			return;
> -		}
> -		env.log_buf = new_buf;
> -		env.log_cap = new_sz;
> -		goto try_again;
> -	}
> -
> -	env.log_cnt += ret;
> +	vprintf(fmt, args);
>  }
>  
>  void test__printf(const char *fmt, ...)
> @@ -477,6 +439,32 @@ static error_t parse_arg(int key, char *arg, struct argp_state *state)
>  	return 0;
>  }
>  
> +static void stdout_hijack(void)
> +{
> +	if (env.verbose || (env.test && env.test->force_log)) {
> +		/* nothing to do, output to stdout by default */
> +		return;
> +	}
> +
> +	/* stdout -> buffer */
> +	fflush(stdout);
> +	stdout = open_memstream(&env.log_buf, &env.log_cap);
Check errors and restore original stdout if something went wrong? (And emit some warning to stderr).
> +}
> +
> +static void stdout_restore(void)
> +{
> +	if (stdout == env.stdout)
> +		return;
> +
> +	fclose(stdout);
> +	free(env.log_buf);
> +
> +	env.log_buf = NULL;
> +	env.log_cap = 0;
> +
> +	stdout = env.stdout;
> +}
> +
>  int main(int argc, char **argv)
>  {
>  	static const struct argp argp = {
> @@ -495,6 +483,7 @@ int main(int argc, char **argv)
>  	srand(time(NULL));
>  
>  	env.jit_enabled = is_jit_enabled();
> +	env.stdout = stdout;
>  
>  	for (i = 0; i < prog_test_cnt; i++) {
>  		struct prog_test_def *test = &prog_test_defs[i];
> @@ -508,6 +497,7 @@ int main(int argc, char **argv)
>  				test->test_num, test->test_name))
>  			continue;
>  
> +		stdout_hijack();
Why do you do this for every test? Just do once before all the tests and restore after?
>  		test->run_test();
>  		/* ensure last sub-test is finalized properly */
>  		if (test->subtest_name)
> @@ -522,6 +512,7 @@ int main(int argc, char **argv)
>  			env.succ_cnt++;
>  
>  		dump_test_log(test, test->error_cnt);
> +		stdout_restore();
>  
>  		printf("#%d %s:%s\n", test->test_num, test->test_name,
>  		       test->error_cnt ? "FAIL" : "OK");
> @@ -529,7 +520,6 @@ int main(int argc, char **argv)
>  	printf("Summary: %d/%d PASSED, %d FAILED\n",
>  	       env.succ_cnt, env.sub_succ_cnt, env.fail_cnt);
>  
> -	free(env.log_buf);
>  	free(env.test_selector.num_set);
>  	free(env.subtest_selector.num_set);
>  
> diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
> index afd14962456f..9fd89078494f 100644
> --- a/tools/testing/selftests/bpf/test_progs.h
> +++ b/tools/testing/selftests/bpf/test_progs.h
> @@ -56,8 +56,8 @@ struct test_env {
>  	bool jit_enabled;
>  
>  	struct prog_test_def *test;
> +	FILE *stdout;
>  	char *log_buf;
> -	size_t log_cnt;
>  	size_t log_cap;
So it's actually log_cnt that's assigned on fflush for memstream, according to man page, so probably keep log_cnt, delete log_cap.
>  
>  	int succ_cnt; /* successful tests */

^ permalink raw reply

* [PATCH] isdn: hysdn: Fix error spaces around '*'
From: Jose Carlos Cazarin Filho @ 2019-08-02 19:56 UTC (permalink / raw)
  To: isdn, gregkh; +Cc: netdev, devel, linux-kernel

Fix checkpath error:
CHECK: spaces preferred around that '*' (ctx:WxV)
+extern hysdn_card *card_root;        /* pointer to first card */

Signed-off-by: Jose Carlos Cazarin Filho <joseespiriki@gmail.com>
---
 Hello all!
 This is my first commit to the Linux Kernel, I'm doing this to learn and be able
 to contribute more in the future
 Peace all! 

 drivers/staging/isdn/hysdn/hysdn_defs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/isdn/hysdn/hysdn_defs.h b/drivers/staging/isdn/hysdn/hysdn_defs.h
index cdac46a21..f20150d62 100644
--- a/drivers/staging/isdn/hysdn/hysdn_defs.h
+++ b/drivers/staging/isdn/hysdn/hysdn_defs.h
@@ -220,7 +220,7 @@ typedef struct hycapictrl_info hycapictrl_info;
 /*****************/
 /* exported vars */
 /*****************/
-extern hysdn_card *card_root;	/* pointer to first card */
+extern hysdn_card * card_root;	/* pointer to first card */
 
 
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next v2] net: phy: broadcom: add 1000Base-X support for BCM54616S
From: Tao Ren @ 2019-08-02 19:52 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, Heiner Kallweit, David S . Miller,
	Arun Parameswaran, Justin Chen, Vladimir Oltean,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	openbmc@lists.ozlabs.org
In-Reply-To: <20190802145011.GH2099@lunn.ch>

On 8/2/19 7:50 AM, Andrew Lunn wrote:
>> +static int bcm54616s_read_status(struct phy_device *phydev)
>> +{
>> +	int err;
>> +
>> +	err = genphy_read_status(phydev);
>> +
>> +	/* 1000Base-X register set doesn't provide speed fields: the
>> +	 * link speed is always 1000 Mb/s as long as link is up.
>> +	 */
>> +	if (phydev->dev_flags & PHY_BCM_FLAGS_MODE_1000BX &&
>> +	    phydev->link)
>> +		phydev->speed = SPEED_1000;
>> +
>> +	return err;
>> +}
> 
> This function is equivalent to bcm5482_read_status(). You should use
> it, rather than add a new function.

Thank you for pointing it out. Will fix the code.

BTW, should I update the patch subject to something more descriptive (such as "net: phy: broadcom: fix BCM54616S read_status in 1000X mode")? Or I should use the same title to avoid confusion?


Thanks,

Tao

^ permalink raw reply

* [PATCH] drivers:staging:isdn:hysdn brace same line if
From: Fernando Eckhardt Valle @ 2019-08-02 19:51 UTC (permalink / raw)
  To: isdn, gregkh, netdev, devel, linux-kernel

Fix checkpatch error "ERROR: that open brace { should be on the previous
line" in drivers/staging/isdn/hysdn/hycapi.c:51.

Signed-off-by: Fernando Eckhardt Valle <phervalle@gmail.com>
---
 drivers/staging/isdn/hysdn/hycapi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/staging/isdn/hysdn/hycapi.c b/drivers/staging/isdn/hysdn/hycapi.c
index a2c15cd7b..e5dc707d5 100644
--- a/drivers/staging/isdn/hysdn/hycapi.c
+++ b/drivers/staging/isdn/hysdn/hycapi.c
@@ -49,8 +49,7 @@ static u16 hycapi_send_message(struct capi_ctr *ctrl, struct sk_buff *skb);
 static inline int _hycapi_appCheck(int app_id, int ctrl_no)
 {
 	if ((ctrl_no <= 0) || (ctrl_no > CAPI_MAXCONTR) || (app_id <= 0) ||
-	   (app_id > CAPI_MAXAPPL))
-	{
+	   (app_id > CAPI_MAXAPPL)) {
 		printk(KERN_ERR "HYCAPI: Invalid request app_id %d for controller %d", app_id, ctrl_no);
 		return -1;
 	}
-- 
2.20.1


^ permalink raw reply related

* [PATCH] isdn: hysdn: fix code style error from checkpatch
From: Ricardo Bruno Lopes da Silva @ 2019-08-02 19:50 UTC (permalink / raw)
  To: isdn; +Cc: gregkh, netdev, devel, linux-kernel, lkcamp

Fix error bellow from checkpatch.

WARNING: Block comments use * on subsequent lines
+/***********************************************************
+

Signed-off-by: Ricardo Bruno Lopes da Silva <ricardo6142@gmail.com>
---
 Hi! This is my first patch, I am learning how to contribute to Linux
kernel. Let me know if you have any suggestions.

Thanks, 
Ricardo Bruno

 drivers/staging/isdn/hysdn/hycapi.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/staging/isdn/hysdn/hycapi.c b/drivers/staging/isdn/hysdn/hycapi.c
index a2c15cd7b..b7ba28d40 100644
--- a/drivers/staging/isdn/hysdn/hycapi.c
+++ b/drivers/staging/isdn/hysdn/hycapi.c
@@ -107,11 +107,8 @@ hycapi_remove_ctr(struct capi_ctr *ctrl)
 	card->hyctrlinfo = NULL;
 }
 
-/***********************************************************
-
-Queue a CAPI-message to the controller.
+/* Queue a CAPI-message to the controller. */
 
-***********************************************************/
 
 static void
 hycapi_sendmsg_internal(struct capi_ctr *ctrl, struct sk_buff *skb)
-- 
2.20.1


^ permalink raw reply related

* Re: [net-next 01/12] net/mlx5: E-Switch, add ingress rate support
From: Alexei Starovoitov @ 2019-08-02 19:44 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Eli Cohen, davem@davemloft.net, netdev@vger.kernel.org,
	Paul Blakey
In-Reply-To: <b2c77010e96b5fdb6693e5cf0a46a2017f389b44.camel@mellanox.com>

On Fri, Aug 02, 2019 at 07:22:21PM +0000, Saeed Mahameed wrote:
> On Fri, 2019-08-02 at 10:37 -0700, Alexei Starovoitov wrote:
> > On Thu, Aug 1, 2019 at 6:30 PM Saeed Mahameed <saeedm@mellanox.com>
> > wrote:
> > > From: Eli Cohen <eli@mellanox.com>
> > > 
> > > Use the scheduling elements to implement ingress rate limiter on an
> > > eswitch ports ingress traffic. Since the ingress of eswitch port is
> > > the
> > > egress of VF port, we control eswitch ingress by controlling VF
> > > egress.
> > 
> > Looks like the patch is only passing args to firmware which is doing
> > the magic.
> > Can you please describe what is the algorithm there?
> > Is it configurable?
> 
> Hi Alexei, 
> 
> I am not sure how much details you are looking for, but let me share
> some of what i know:
> 
> From a previous submission for legacy mode sriov vf bw limit, where we 
> introduced the FW configuration API and the legacy sriov use case: 
> https://patchwork.kernel.org/patch/9404655/
> 
> So basically the algorithm is Deficit Weighted Round Robin (DWRR)
> between the agents, we can control BW allocation/weight of each agent
> (vf vport).

commit log of this patch says nothing about DWRR.
It is also not using any of the api that were provided by that
earlier patch.
what is going on?


^ permalink raw reply

* Re: [PATCH v4 4/4] net: phy: realtek: configure RTL8211E LEDs
From: Matthias Kaehlcke @ 2019-08-02 19:40 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S . Miller, Rob Herring, Mark Rutland, Florian Fainelli,
	Heiner Kallweit, netdev, devicetree, linux-kernel,
	Douglas Anderson
In-Reply-To: <20190802181840.GP2099@lunn.ch>

Hi Andrew,

On Fri, Aug 02, 2019 at 08:18:40PM +0200, Andrew Lunn wrote:
> On Thu, Aug 01, 2019 at 12:07:59PM -0700, Matthias Kaehlcke wrote:
> > Configure the RTL8211E LEDs behavior when the device tree property
> > 'realtek,led-modes' is specified.

note to self: update commit message

> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> 
> Hi Matthias
> 
> I was more thinking of adding a new driver call to the PHY driver API,
> to configure an LED. Something like
> 
> rtl8211e_config_leds(phydev, int led, struct phy_led_config cfg);

I guess it sould be singular ('_config_led') if it configures a single
LED.

> It would be called by the phylib core after config_init(). But also,
> thinking ahead to generic linux LED support, it could be called later
> to reconfigure the LEDs to use a different trigger. The standard LED
> sysfs interface would be used.

I'll look into the phylib part.

Thanks

Matthias

^ permalink raw reply

* [PATCH net-next] net: dsa: dump CPU port regs through master
From: Vivien Didelot @ 2019-08-02 19:34 UTC (permalink / raw)
  To: netdev; +Cc: f.fainelli, andrew, davem, linville, cphealy, Vivien Didelot
In-Reply-To: <20190802193455.17126-1-vivien.didelot@gmail.com>

Merge the CPU port registers dump into the master interface registers
dump through ethtool, by nesting the ethtool_drvinfo and ethtool_regs
structures of the CPU port into the dump.

drvinfo->regdump_len will contain the full data length, while regs->len
will contain only the master interface registers dump length.

This allows for example to dump the CPU port registers on a ZII Dev
C board like this:

    # ethtool -d eth1
    0x004:                                              0x00000000
    0x008:                                              0x0a8000aa
    0x010:                                              0x01000000
    0x014:                                              0x00000000
    0x024:                                              0xf0000102
    0x040:                                              0x6d82c800
    0x044:                                              0x00000020
    0x064:                                              0x40000000
    0x084: RCR (Receive Control Register)               0x47c00104
        MAX_FL (Maximum frame length)                   1984
        FCE (Flow control enable)                       0
        BC_REJ (Broadcast frame reject)                 0
        PROM (Promiscuous mode)                         0
        DRT (Disable receive on transmit)               0
        LOOP (Internal loopback)                        0
    0x0c4: TCR (Transmit Control Register)              0x00000004
        RFC_PAUSE (Receive frame control pause)         0
        TFC_PAUSE (Transmit frame control pause)        0
        FDEN (Full duplex enable)                       1
        HBC (Heartbeat control)                         0
        GTS (Graceful transmit stop)                    0
    0x0e4:                                              0x76735d6d
    0x0e8:                                              0x7e9e8808
    0x0ec:                                              0x00010000
    .
    .
    .
    88E6352  Switch Port Registers
    ------------------------------
    00: Port Status                            0x4d04
          Pause Enabled                        0
          My Pause                             1
          802.3 PHY Detected                   0
          Link Status                          Up
          Duplex                               Full
          Speed                                100 or 200 Mbps
          EEE Enabled                          0
          Transmitter Paused                   0
          Flow Control                         0
          Config Mode                          0x4
    01: Physical Control                       0x003d
          RGMII Receive Timing Control         Default
          RGMII Transmit Timing Control        Default
          200 BASE Mode                        100
          Flow Control's Forced value          0
          Force Flow Control                   0
          Link's Forced value                  Up
          Force Link                           1
          Duplex's Forced value                Full
          Force Duplex                         1
          Force Speed                          100 or 200 Mbps
    .
    .
    .

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 net/dsa/master.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/net/dsa/master.c b/net/dsa/master.c
index 4b52f8bac5e1..a8e52c9967f4 100644
--- a/net/dsa/master.c
+++ b/net/dsa/master.c
@@ -8,6 +8,70 @@
 
 #include "dsa_priv.h"
 
+static int dsa_master_get_regs_len(struct net_device *dev)
+{
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
+	const struct ethtool_ops *ops = cpu_dp->orig_ethtool_ops;
+	struct dsa_switch *ds = cpu_dp->ds;
+	int port = cpu_dp->index;
+	int ret = 0;
+	int len;
+
+	if (ops->get_regs_len) {
+		len = ops->get_regs_len(dev);
+		if (len < 0)
+			return len;
+		ret += len;
+	}
+
+	ret += sizeof(struct ethtool_drvinfo);
+	ret += sizeof(struct ethtool_regs);
+
+	if (ds->ops->get_regs_len) {
+		len = ds->ops->get_regs_len(ds, port);
+		if (len < 0)
+			return len;
+		ret += len;
+	}
+
+	return ret;
+}
+
+static void dsa_master_get_regs(struct net_device *dev,
+				struct ethtool_regs *regs, void *data)
+{
+	struct dsa_port *cpu_dp = dev->dsa_ptr;
+	const struct ethtool_ops *ops = cpu_dp->orig_ethtool_ops;
+	struct dsa_switch *ds = cpu_dp->ds;
+	struct ethtool_drvinfo *cpu_info;
+	struct ethtool_regs *cpu_regs;
+	int port = cpu_dp->index;
+	int len;
+
+	if (ops->get_regs_len && ops->get_regs) {
+		len = ops->get_regs_len(dev);
+		if (len < 0)
+			return;
+		regs->len = len;
+		ops->get_regs(dev, regs, data);
+		data += regs->len;
+	}
+
+	cpu_info = (struct ethtool_drvinfo *)data;
+	strlcpy(cpu_info->driver, "dsa", sizeof(cpu_info->driver));
+	data += sizeof(*cpu_info);
+	cpu_regs = (struct ethtool_regs *)data;
+	data += sizeof(*cpu_regs);
+
+	if (ds->ops->get_regs_len && ds->ops->get_regs) {
+		len = ds->ops->get_regs_len(ds, port);
+		if (len < 0)
+			return;
+		cpu_regs->len = len;
+		ds->ops->get_regs(ds, port, cpu_regs, data);
+	}
+}
+
 static void dsa_master_get_ethtool_stats(struct net_device *dev,
 					 struct ethtool_stats *stats,
 					 uint64_t *data)
@@ -147,6 +211,8 @@ static int dsa_master_ethtool_setup(struct net_device *dev)
 	if (cpu_dp->orig_ethtool_ops)
 		memcpy(ops, cpu_dp->orig_ethtool_ops, sizeof(*ops));
 
+	ops->get_regs_len = dsa_master_get_regs_len;
+	ops->get_regs = dsa_master_get_regs;
 	ops->get_sset_count = dsa_master_get_sset_count;
 	ops->get_ethtool_stats = dsa_master_get_ethtool_stats;
 	ops->get_strings = dsa_master_get_strings;
-- 
2.22.0


^ permalink raw reply related

* [PATCH ethtool] ethtool: dump nested registers
From: Vivien Didelot @ 2019-08-02 19:34 UTC (permalink / raw)
  To: netdev; +Cc: f.fainelli, andrew, davem, linville, cphealy, Vivien Didelot

Usually kernel drivers set the regs->len value to the same length as
info->regdump_len, which was used for the allocation. In case where
regs->len is smaller than the allocated info->regdump_len length,
we may assume that the dump contains a nested set of registers.

This becomes handy for kernel drivers to expose registers of an
underlying network conduit unfortunately not exposed to userspace,
as found in network switching equipment for example.

This patch adds support for recursing into the dump operation if there
is enough room for a nested ethtool_drvinfo structure containing a
valid driver name, followed by a ethtool_regs structure like this:

    0      regs->len                        info->regdump_len
    v              v                                        v
    +--------------+-----------------+--------------+-- - --+
    | ethtool_regs | ethtool_drvinfo | ethtool_regs |       |
    +--------------+-----------------+--------------+-- - --+

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 ethtool.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 05fe05a08..c0e2903c5 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -1245,7 +1245,7 @@ static int dump_regs(int gregs_dump_raw, int gregs_dump_hex,
 
 	if (gregs_dump_raw) {
 		fwrite(regs->data, regs->len, 1, stdout);
-		return 0;
+		goto nested;
 	}
 
 	if (!gregs_dump_hex)
@@ -1253,7 +1253,7 @@ static int dump_regs(int gregs_dump_raw, int gregs_dump_hex,
 			if (!strncmp(driver_list[i].name, info->driver,
 				     ETHTOOL_BUSINFO_LEN)) {
 				if (driver_list[i].func(info, regs) == 0)
-					return 0;
+					goto nested;
 				/* This version (or some other
 				 * variation in the dump format) is
 				 * not handled; fall back to hex
@@ -1263,6 +1263,15 @@ static int dump_regs(int gregs_dump_raw, int gregs_dump_hex,
 
 	dump_hex(stdout, regs->data, regs->len, 0);
 
+nested:
+	/* Recurse dump if some drvinfo and regs structures are nested */
+	if (info->regdump_len > regs->len + sizeof(*info) + sizeof(*regs)) {
+		info = (struct ethtool_drvinfo *)(&regs->data[0] + regs->len);
+		regs = (struct ethtool_regs *)(&regs->data[0] + regs->len + sizeof(*info));
+
+		return dump_regs(gregs_dump_raw, gregs_dump_hex, info, regs);
+	}
+
 	return 0;
 }
 
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites
From: John Hubbard @ 2019-08-02 19:33 UTC (permalink / raw)
  To: Peter Zijlstra, john.hubbard
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Dave Hansen, Ira Weiny, Jan Kara, Jason Gunthorpe,
	Jérôme Glisse, LKML, amd-gfx, ceph-devel, devel, devel,
	dri-devel, intel-gfx, kvm, linux-arm-kernel, linux-block,
	linux-crypto, linux-fbdev, linux-fsdevel, linux-media, linux-mm,
	linux-nfs, linux-rdma, linux-rpi-kernel, linux-xfs, netdev,
	rds-devel, sparclinux, x86, xen-devel
In-Reply-To: <20190802080554.GD2332@hirez.programming.kicks-ass.net>

On 8/2/19 1:05 AM, Peter Zijlstra wrote:
> On Thu, Aug 01, 2019 at 07:16:19PM -0700, john.hubbard@gmail.com wrote:
> 
>> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
>> ("mm: introduce put_user_page*(), placeholder versions"). That commit
>> has an extensive description of the problem and the planned steps to
>> solve it, but the highlites are:
> 
> That is one horridly mangled Changelog there :-/ It looks like it's
> partially duplicated.

Yeah. It took so long to merge that I think I was no longer able to
actually see the commit description, after N readings. sigh

> 
> Anyway; no objections to any of that, but I just wanted to mention that
> there are other problems with long term pinning that haven't been
> mentioned, notably they inhibit compaction.
> 
> A long time ago I proposed an interface to mark pages as pinned, such
> that we could run compaction before we actually did the pinning.
> 

This is all heading toward marking pages as pinned, so we should finally
get there.  I'll post the RFC for tracking pinned pages shortly.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH 20/34] xen: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-08-02 19:25 UTC (permalink / raw)
  To: Weiny, Ira, Juergen Gross, john.hubbard@gmail.com, Andrew Morton
  Cc: devel@driverdev.osuosl.org, Dave Chinner, Christoph Hellwig,
	Williams, Dan J, x86@kernel.org, linux-mm@kvack.org, Dave Hansen,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org,
	linux-arm-kernel@lists.infradead.org,
	linux-rpi-kernel@lists.infradead.org, devel@lists.orangefs.org,
	xen-devel@lists.xenproject.org, Boris Ostrovsky,
	rds-devel@oss.oracle.com, Jérôme Glisse, Jan Kara,
	ceph-devel@vger.kernel.org, kvm@vger.kernel.org,
	linux-block@vger.kernel.org, linux-crypto@vger.kernel.org,
	linux-fbdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, LKML,
	linux-media@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-xfs@vger.kernel.org,
	netdev@vger.kernel.org, sparclinux@vger.kernel.org,
	Jason Gunthorpe
In-Reply-To: <2807E5FD2F6FDA4886F6618EAC48510E79E66216@CRSMSX101.amr.corp.intel.com>

On 8/2/19 9:09 AM, Weiny, Ira wrote:
>>
>> On 02.08.19 07:48, John Hubbard wrote:
>>> On 8/1/19 9:36 PM, Juergen Gross wrote:
>>>> On 02.08.19 04:19, john.hubbard@gmail.com wrote:
>>>>> From: John Hubbard <jhubbard@nvidia.com>
>>> ...
>>> If that's not the case (both here, and in 3 or 4 other patches in this
>>> series, then as you said, I should add NULL checks to put_user_pages()
>>> and put_user_pages_dirty_lock().
>>
>> In this case it is not correct, but can easily be handled. The NULL case can
>> occur only in an error case with the pages array filled partially or not at all.
>>
>> I'd prefer something like the attached patch here.
> 
> I'm not an expert in this code and have not looked at it carefully but that patch does seem to be the better fix than forcing NULL checks on everyone.
> 

OK, I'll use Juergen's approach, and also check for that pattern in the
other patches.


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [net-next 01/12] net/mlx5: E-Switch, add ingress rate support
From: Saeed Mahameed @ 2019-08-02 19:22 UTC (permalink / raw)
  To: alexei.starovoitov@gmail.com
  Cc: Eli Cohen, davem@davemloft.net, netdev@vger.kernel.org,
	Paul Blakey
In-Reply-To: <CAADnVQ+VOSYxbF9RiMJx4kY9bxJCS+Tsf97nsOnRLvi2r6RCog@mail.gmail.com>

On Fri, 2019-08-02 at 10:37 -0700, Alexei Starovoitov wrote:
> On Thu, Aug 1, 2019 at 6:30 PM Saeed Mahameed <saeedm@mellanox.com>
> wrote:
> > From: Eli Cohen <eli@mellanox.com>
> > 
> > Use the scheduling elements to implement ingress rate limiter on an
> > eswitch ports ingress traffic. Since the ingress of eswitch port is
> > the
> > egress of VF port, we control eswitch ingress by controlling VF
> > egress.
> 
> Looks like the patch is only passing args to firmware which is doing
> the magic.
> Can you please describe what is the algorithm there?
> Is it configurable?

Hi Alexei, 

I am not sure how much details you are looking for, but let me share
some of what i know:

From a previous submission for legacy mode sriov vf bw limit, where we 
introduced the FW configuration API and the legacy sriov use case: 
https://patchwork.kernel.org/patch/9404655/

So basically the algorithm is Deficit Weighted Round Robin (DWRR)
between the agents, we can control BW allocation/weight of each agent
(vf vport).

Quoting the commit message from the above link:

"The TSAR implements a Deficit Weighted Round Robin (DWRR) between the
agents. Each agent attached to the TSAR is assigned with a Weight. An
agent is awarded with transmission tokens according to its Weight, and
charged with transmission Tokens according to the amount of data it has
transmitted. Effectively, the Weight (relative to the other agents’
Weight) defines the percentage of the BW an agent will receive,
assuming it has enough data to sustain this BW.

This arbitration scheme is work-preserving, meaning that an agent not
using the entire BW it was allocated, hands over the excess BW, to be
redistributed among the other agents. Each agent will receive
additional BW according to its Weight."

Thanks,
Saeed.

^ permalink raw reply

* [PATCH net 2/2] tc-testing: updated vlan action tests with batch create/delete
From: Roman Mashak @ 2019-08-02 19:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
In-Reply-To: <1564773407-26209-1-git-send-email-mrv@mojatatu.com>

Update TDC tests with cases varifying ability of TC to install or delete
batches of vlan actions.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 .../tc-testing/tc-tests/actions/vlan.json          | 94 ++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json b/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json
index cc7c7d758008..6503b1ce091f 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/vlan.json
@@ -713,5 +713,99 @@
         "teardown": [
             "$TC actions flush action vlan"
         ]
+    },
+    {
+        "id": "294e",
+        "name": "Add batch of 32 vlan push actions with cookie",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action vlan push protocol 802.1q id 4094 priority 7 pipe index \\$i cookie aabbccddeeff112233445566778800a1 \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "32",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "56f7",
+        "name": "Delete batch of 32 vlan push actions",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ],
+            "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action vlan push protocol 802.1q id 4094 priority 7 pipe index \\$i \\\"; args=\\\"\\$args\\$cmd\\\"; done && $TC actions add \\$args\""
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action vlan index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions del \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "0",
+        "teardown": []
+    },
+    {
+        "id": "759f",
+        "name": "Add batch of 32 vlan pop actions with cookie",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action vlan pop continue index \\$i cookie aabbccddeeff112233445566778800a1 \\\"; args=\"\\$args\\$cmd\"; done && $TC actions add \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "32",
+        "teardown": [
+            "$TC actions flush action vlan"
+        ]
+    },
+    {
+        "id": "c84a",
+        "name": "Delete batch of 32 vlan pop actions",
+        "category": [
+            "actions",
+            "vlan"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action vlan",
+                0,
+                1,
+                255
+            ],
+            "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action vlan pop index \\$i \\\"; args=\\\"\\$args\\$cmd\\\"; done && $TC actions add \\$args\""
+        ],
+        "cmdUnderTest": "bash -c \"for i in \\`seq 1 32\\`; do cmd=\\\"action vlan index \\$i \\\"; args=\"\\$args\\$cmd\"; done && $TC actions del \\$args\"",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions list action vlan",
+        "matchPattern": "^[ \t]+index [0-9]+ ref",
+        "matchCount": "0",
+        "teardown": []
     }
 ]
-- 
2.7.4


^ permalink raw reply related

* [PATCH net 1/2] net sched: update vlan action for batched events operations
From: Roman Mashak @ 2019-08-02 19:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak
In-Reply-To: <1564773407-26209-1-git-send-email-mrv@mojatatu.com>

Add get_fill_size() routine used to calculate the action size
when building a batch of events.

Fixes: c7e2b9689 ("sched: introduce vlan action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 net/sched/act_vlan.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 9269d350fb8a..e0c97267bccb 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -306,6 +306,14 @@ static int tcf_vlan_search(struct net *net, struct tc_action **a, u32 index)
 	return tcf_idr_search(tn, a, index);
 }
 
+static size_t tcf_vlan_get_fill_size(const struct tc_action *act)
+{
+	return nla_total_size(sizeof(struct tc_vlan))
+		+ nla_total_size(sizeof(u16)) /* TCA_VLAN_PUSH_VLAN_ID */
+		+ nla_total_size(sizeof(u16)) /* TCA_VLAN_PUSH_VLAN_PROTOCOL */
+		+ nla_total_size(sizeof(u8)); /* TCA_VLAN_PUSH_VLAN_PRIORITY */
+}
+
 static struct tc_action_ops act_vlan_ops = {
 	.kind		=	"vlan",
 	.id		=	TCA_ID_VLAN,
@@ -315,6 +323,7 @@ static struct tc_action_ops act_vlan_ops = {
 	.init		=	tcf_vlan_init,
 	.cleanup	=	tcf_vlan_cleanup,
 	.walk		=	tcf_vlan_walker,
+	.get_fill_size	=	tcf_vlan_get_fill_size,
 	.lookup		=	tcf_vlan_search,
 	.size		=	sizeof(struct tcf_vlan),
 };
-- 
2.7.4


^ permalink raw reply related

* [PATCH net 0/2] Fix batched event generation for vlan action
From: Roman Mashak @ 2019-08-02 19:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak

When adding or deleting a batch of entries, the kernel sends up to
TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
space. However it does not consider that the action sizes may vary and
require different skb sizes.

For example, consider the following script adding 32 entries with all
supported vlan parameters (in order to maximize netlink messages size):

% cat tc-batch.sh
TC="sudo /mnt/iproute2.git/tc/tc"

$TC actions flush action vlan
for i in `seq 1 $1`;
do
   cmd="action vlan push protocol 802.1q id 4094 priority 7 pipe \
               index $i cookie aabbccddeeff112233445566778800a1 "
   args=$args$cmd
done
$TC actions add $args
%
% ./tc-batch.sh 32
Error: Failed to fill netlink attributes while adding TC action.
We have an error talking to the kernel
%

patch 1 adds callback in tc_action_ops of vlan action, which calculates
the action size, and passes size to tcf_add_notify()/tcf_del_notify().

patch 2 updates the TDC test suite with relevant vlan test cases.

Roman Mashak (2):
  net sched: update vlan action for batched events operations
  tc-testing: updated vlan action tests with batch create/delete

 net/sched/act_vlan.c                               |  9 +++
 .../tc-testing/tc-tests/actions/vlan.json          | 94 ++++++++++++++++++++++
 2 files changed, 103 insertions(+)

-- 
2.7.4

^ permalink raw reply

* Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites
From: John Hubbard @ 2019-08-02 19:14 UTC (permalink / raw)
  To: Jan Kara, Matthew Wilcox
  Cc: Michal Hocko, john.hubbard, Andrew Morton, Christoph Hellwig,
	Dan Williams, Dave Chinner, Dave Hansen, Ira Weiny,
	Jason Gunthorpe, Jérôme Glisse, LKML, amd-gfx,
	ceph-devel, devel, devel, dri-devel, intel-gfx, kvm,
	linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel
In-Reply-To: <20190802145227.GQ25064@quack2.suse.cz>

On 8/2/19 7:52 AM, Jan Kara wrote:
> On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
>> On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
>>> On Fri 02-08-19 11:12:44, Michal Hocko wrote:
>>>> On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
>>>> [...]
>>>>> 2) Convert all of the call sites for get_user_pages*(), to
>>>>> invoke put_user_page*(), instead of put_page(). This involves dozens of
>>>>> call sites, and will take some time.
>>>>
>>>> How do we make sure this is the case and it will remain the case in the
>>>> future? There must be some automagic to enforce/check that. It is simply
>>>> not manageable to do it every now and then because then 3) will simply
>>>> be never safe.
>>>>
>>>> Have you considered coccinele or some other scripted way to do the
>>>> transition? I have no idea how to deal with future changes that would
>>>> break the balance though.

Hi Michal,

Yes, I've thought about it, and coccinelle falls a bit short (it's not smart
enough to know which put_page()'s to convert). However, there is a debug
option planned: a yet-to-be-posted commit [1] uses struct page extensions
(obviously protected by CONFIG_DEBUG_GET_USER_PAGES_REFERENCES) to add
a redundant counter. That allows:

void __put_page(struct page *page)
{
	...
	/* Someone called put_page() instead of put_user_page() */
	WARN_ON_ONCE(atomic_read(&page_ext->pin_count) > 0);

>>>
>>> Yeah, that's why I've been suggesting at LSF/MM that we may need to create
>>> a gup wrapper - say vaddr_pin_pages() - and track which sites dropping
>>> references got converted by using this wrapper instead of gup. The
>>> counterpart would then be more logically named as unpin_page() or whatever
>>> instead of put_user_page().  Sure this is not completely foolproof (you can
>>> create new callsite using vaddr_pin_pages() and then just drop refs using
>>> put_page()) but I suppose it would be a high enough barrier for missed
>>> conversions... Thoughts?

The debug option above is still a bit simplistic in its implementation (and maybe
not taking full advantage of the data it has), but I think it's preferable,
because it monitors the "core" and WARNs.

Instead of the wrapper, I'm thinking: documentation and the passage of time,
plus the debug option (perhaps enhanced--probably once I post it someone will
notice opportunities), yes?

>>
>> I think the API we really need is get_user_bvec() / put_user_bvec(),
>> and I know Christoph has been putting some work into that.  That avoids
>> doing refcount operations on hundreds of pages if the page in question is
>> a huge page.  Once people are switched over to that, they won't be tempted
>> to manually call put_page() on the individual constituent pages of a bvec.
> 
> Well, get_user_bvec() is certainly a good API for one class of users but
> just looking at the above series, you'll see there are *many* places that
> just don't work with bvecs at all and you need something for those.
> 

Yes, there are quite a few places that don't involve _bvec, as we can see
right here. So we need something. Andrew asked for a debug option some time
ago, and several people (Dave Hansen, Dan Williams, Jerome) had the idea
of vmap-ing gup pages separately, so you can definitely tell where each
page came from. I'm hoping not to have to go to that level of complexity
though.


[1] "mm/gup: debug tracking of get_user_pages() references" :
https://github.com/johnhubbard/linux/commit/21ff7d6161ec2a14d3f9d17c98abb00cc969d4d6

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits
From: Neal Cardwell @ 2019-08-02 19:14 UTC (permalink / raw)
  To: Bernd; +Cc: netdev, Eric Dumazet
In-Reply-To: <CABOR3+yUiu1BzCojFQFADUKc5BT2-Ew_j7KFNpjP8WoMYZ+SMA@mail.gmail.com>

On Fri, Aug 2, 2019 at 3:03 PM Bernd <ecki@zusammenkunft.net> wrote:
>
> Hello,
>
> While analyzing a aborted upload packet capture I came across a odd
> trace where a sender was not responding to a duplicate SACK but
> sending further segments until it stalled.
>
> Took me some time until I remembered this fix, and actually the
> problems started since the security fix was applied.
>
> I see a high counter for TCPWqueueTooBig - and I don’t think that’s an
> actual attack.
>
> Is there a probability for triggering the limit with connections with
> big windows and large send buffers and dropped segments? If so what
> would be the plan? It does not look like it is configurable. The trace
> seem to have 100 (filled) inflight segments.
>
> Gruss
> Bernd
> --
> http://bernd.eckenfels.net

What's the exact kernel version you are using?

Eric submitted a patch recently that may address your issue:
   tcp: be more careful in tcp_fragment()
  https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=b617158dc096709d8600c53b6052144d12b89fab

Would you be able to test your workload with that commit
cherry-picked, and see if the issue still occurs?

That commit was targeted to many stable releases, so you may be able
to pick up that fix from a stable branch.

cheers,
neal

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox