* Re: [PATCH bpf-next 1/2] cpuidle/rcu: Making arch_cpu_idle and rcu_idle_exit noinstr
From: kernel test robot @ 2022-05-17 2:39 UTC (permalink / raw)
To: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Masami Hiramatsu, Paul E. McKenney
Cc: kbuild-all, netdev, bpf, lkml, Martin KaFai Lau, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Steven Rostedt
In-Reply-To: <20220515203653.4039075-1-jolsa@kernel.org>
Hi Jiri,
I love your patch! Perhaps something to improve:
[auto build test WARNING on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Jiri-Olsa/cpuidle-rcu-Making-arch_cpu_idle-and-rcu_idle_exit-noinstr/20220516-043752
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-c002-20220516 (https://download.01.org/0day-ci/archive/20220517/202205171050.9msZot6F-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/0b6fee32d730f621f2bfc4d8d9f0729814398415
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Jiri-Olsa/cpuidle-rcu-Making-arch_cpu_idle-and-rcu_idle_exit-noinstr/20220516-043752
git checkout 0b6fee32d730f621f2bfc4d8d9f0729814398415
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> vmlinux.o: warning: objtool: arch_cpu_idle()+0x0: call to {dynamic}() leaves .noinstr.text section
>> vmlinux.o: warning: objtool: rcu_idle_exit()+0x16: call to trace_hardirqs_off() leaves .noinstr.text section
vmlinux.o: warning: objtool: enter_from_user_mode()+0x18: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode()+0x1d: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare()+0x18: call to __kcsan_check_access() leaves .noinstr.text section
vmlinux.o: warning: objtool: irqentry_enter_from_user_mode()+0x18: call to __kcsan_check_access() leaves .noinstr.text section
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply
* [PATCH linux-next v2] net: ath5k: replace ternary operator with min()
From: Guo Zhengkui @ 2022-05-17 2:39 UTC (permalink / raw)
To: Jiri Slaby, Nick Kossifidis, Luis Chamberlain, Kalle Valo,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
open list:ATHEROS ATH5K WIRELESS DRIVER,
open list:NETWORKING DRIVERS, open list
Cc: zhengkui_guo, Guo Zhengkui
In-Reply-To: <874k1pxvca.fsf@kernel.org>
Fix the following coccicheck warning:
drivers/net/wireless/ath/ath5k/phy.c:3139:62-63: WARNING
opportunity for min()
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
---
drivers/net/wireless/ath/ath5k/phy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath5k/phy.c b/drivers/net/wireless/ath/ath5k/phy.c
index 00f9e347d414..5797ef9c73d7 100644
--- a/drivers/net/wireless/ath/ath5k/phy.c
+++ b/drivers/net/wireless/ath/ath5k/phy.c
@@ -3136,7 +3136,7 @@ ath5k_combine_pwr_to_pdadc_curves(struct ath5k_hw *ah,
pdadc_n = gain_boundaries[pdg] + pd_gain_overlap - pwr_min[pdg];
/* Limit it to be inside pwr range */
table_size = pwr_max[pdg] - pwr_min[pdg];
- max_idx = (pdadc_n < table_size) ? pdadc_n : table_size;
+ max_idx = min(pdadc_n, table_size);
/* Fill pdadc_out table */
while (pdadc_0 < max_idx && pdadc_i < 128)
--
2.20.1
^ permalink raw reply related
* [PATCH linux-next v2] net: ath9k: replace ternary operator with max()
From: Guo Zhengkui @ 2022-05-17 2:41 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, Kalle Valo, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni,
open list:QUALCOMM ATHEROS ATH9K WIRELESS DRIVER,
open list:NETWORKING DRIVERS, open list
Cc: zhengkui_guo, Guo Zhengkui
In-Reply-To: <874k1pxvca.fsf@kernel.org>
Fix the following coccicheck warning:
drivers/net/wireless/ath/ath9k/dfs.c:249:28-30: WARNING
opportunity for max()
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
---
drivers/net/wireless/ath/ath9k/dfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath9k/dfs.c b/drivers/net/wireless/ath/ath9k/dfs.c
index acb9602aa464..11349218bc21 100644
--- a/drivers/net/wireless/ath/ath9k/dfs.c
+++ b/drivers/net/wireless/ath/ath9k/dfs.c
@@ -246,7 +246,7 @@ ath9k_postprocess_radar_event(struct ath_softc *sc,
DFS_STAT_INC(sc, dc_phy_errors);
/* when both are present use stronger one */
- rssi = (ard->rssi < ard->ext_rssi) ? ard->ext_rssi : ard->rssi;
+ rssi = max(ard->rssi, ard->ext_rssi);
break;
default:
/*
--
2.20.1
^ permalink raw reply related
* Re: [PATCH bpf-next] selftests/bpf: Add missing trampoline program type to trampoline_count test
From: Yonghong Song @ 2022-05-17 2:49 UTC (permalink / raw)
To: Yuntao Wang, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, Song Liu, John Fastabend, KP Singh, Shuah Khan,
Yucong Sun, Toke Høiland-Jørgensen, Kui-Feng Lee,
Jiri Olsa, netdev, bpf, linux-kernel, linux-kselftest
In-Reply-To: <20220515063120.526063-1-ytcoode@gmail.com>
On 5/14/22 11:31 PM, Yuntao Wang wrote:
> Currently the trampoline_count test doesn't include any fmod_ret bpf
> programs, fix it to make the test cover all possible trampoline program
> types.
>
> Since fmod_ret bpf programs can't be attached to __set_task_comm function,
> as it's neither whitelisted for error injection nor a security hook, change
> it to bpf_modify_return_test.
>
> This patch also does some other cleanups such as removing duplicate code,
> dropping inconsistent comments, etc.
>
> Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
^ permalink raw reply
* Re: [RFC Patch net-next v2 1/9] net: dsa: microchip: ksz8795: update the port_cnt value in ksz_chip_data
From: Florian Fainelli @ 2022-05-17 2:53 UTC (permalink / raw)
To: Arun Ramadoss, linux-kernel, netdev
Cc: Russell King, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
Vivien Didelot, Vladimir Oltean, David S. Miller, Jakub Kicinski,
Paolo Abeni, Oleksij Rempel, Marek Vasut, Michael Grzeschik,
Eric Dumazet
In-Reply-To: <20220513102219.30399-2-arun.ramadoss@microchip.com>
On 5/13/2022 3:22 AM, Arun Ramadoss wrote:
> The port_cnt value in the structure is not used in the switch_init.
> Instead it uses the fls(chip->cpu_port), this is due to one of port in
> the ksz8794 unavailable. The cpu_port for the 8794 is 0x10, fls(0x10) =
> 5, hence updating it directly in the ksz_chip_data structure in order to
> same with all the other switches in ksz8795.c and ksz9477.c files.
>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH bpf-next 1/2] cpuidle/rcu: Making arch_cpu_idle and rcu_idle_exit noinstr
From: Yonghong Song @ 2022-05-17 2:54 UTC (permalink / raw)
To: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Masami Hiramatsu, Paul E. McKenney
Cc: netdev, bpf, lkml, Martin KaFai Lau, Song Liu, John Fastabend,
KP Singh, Steven Rostedt
In-Reply-To: <20220515203653.4039075-1-jolsa@kernel.org>
On 5/15/22 1:36 PM, Jiri Olsa wrote:
> Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> in rcu 'not watching' context and if there's tracer attached to
> them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> warning like:
>
> [ 3.017540] WARNING: suspicious RCU usage
> ...
> [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> [ 3.018371] fprobe_handler.part.0+0xab/0x150
> [ 3.018374] 0xffffffffa00080c8
> [ 3.018393] ? arch_cpu_idle+0x5/0x10
> [ 3.018398] arch_cpu_idle+0x5/0x10
> [ 3.018399] default_idle_call+0x59/0x90
> [ 3.018401] do_idle+0x1c3/0x1d0
>
> The call path is following:
>
> default_idle_call
> rcu_idle_enter
> arch_cpu_idle
> rcu_idle_exit
>
> The arch_cpu_idle and rcu_idle_exit are the only ones from above
> path that are traceble and cause this problem on my setup.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> arch/x86/kernel/process.c | 2 +-
> kernel/rcu/tree.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index b370767f5b19..1345cb0124a6 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -720,7 +720,7 @@ void arch_cpu_idle_dead(void)
> /*
> * Called from the generic idle code.
> */
> -void arch_cpu_idle(void)
> +void noinstr arch_cpu_idle(void)
noinstr includes a lot of attributes:
#define noinstr \
noinline notrace __attribute((__section__(".noinstr.text"))) \
__no_kcsan __no_sanitize_address __no_profile
__no_sanitize_coverage
should we use notrace here?
> {
> x86_idle();
> }
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index a4b8189455d5..20d529722f51 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -896,7 +896,7 @@ static void noinstr rcu_eqs_exit(bool user)
> * If you add or remove a call to rcu_idle_exit(), be sure to test with
> * CONFIG_RCU_EQS_DEBUG=y.
> */
> -void rcu_idle_exit(void)
> +void noinstr rcu_idle_exit(void)
> {
> unsigned long flags;
>
^ permalink raw reply
* Re: [RFC Patch net-next v2 2/9] net: dsa: microchip: move ksz_chip_data to ksz_common
From: Florian Fainelli @ 2022-05-17 2:55 UTC (permalink / raw)
To: Arun Ramadoss, linux-kernel, netdev
Cc: Russell King, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
Vivien Didelot, Vladimir Oltean, David S. Miller, Jakub Kicinski,
Paolo Abeni, Oleksij Rempel, Marek Vasut, Michael Grzeschik,
Eric Dumazet
In-Reply-To: <20220513102219.30399-3-arun.ramadoss@microchip.com>
On 5/13/2022 3:22 AM, Arun Ramadoss wrote:
> This patch moves the ksz_chip_data in ksz8795 and ksz9477 to ksz_common.
> At present, the dev->chip_id is iterated with the ksz_chip_data and then
> copy its value to the ksz_dev structure. These values are declared as
> constant.
> Instead of copying the values and referencing it, this patch update the
> dev->info to the ksz_chip_data based on the chip_id in the init
> function. And also update the ksz_chip_data values for the LAN937x based
> switches.
>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [RFC Patch net-next v2 6/9] net: dsa: microchip: move get_strings to ksz_common
From: Florian Fainelli @ 2022-05-17 2:56 UTC (permalink / raw)
To: Arun Ramadoss, linux-kernel, netdev
Cc: Russell King, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
Vivien Didelot, Vladimir Oltean, David S. Miller, Jakub Kicinski,
Paolo Abeni, Oleksij Rempel, Marek Vasut, Michael Grzeschik,
Eric Dumazet
In-Reply-To: <20220513102219.30399-7-arun.ramadoss@microchip.com>
On 5/13/2022 3:22 AM, Arun Ramadoss wrote:
> ksz8795 and ksz9477 uses the same algorithm for copying the ethtool
> strings. Hence moved to ksz_common to remove the redundant code.
>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [RFC Patch net-next v2 7/9] net: dsa: move mib->cnt_ptr reset code to ksz_common.c
From: Florian Fainelli @ 2022-05-17 2:56 UTC (permalink / raw)
To: Arun Ramadoss, linux-kernel, netdev
Cc: Russell King, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
Vivien Didelot, Vladimir Oltean, David S. Miller, Jakub Kicinski,
Paolo Abeni, Oleksij Rempel, Marek Vasut, Michael Grzeschik,
Eric Dumazet
In-Reply-To: <20220513102219.30399-8-arun.ramadoss@microchip.com>
On 5/13/2022 3:22 AM, Arun Ramadoss wrote:
> From: Prasanna Vengateshan <prasanna.vengateshan@microchip.com>
>
> mib->cnt_ptr resetting is handled in multiple places as part of
> port_init_cnt(). Hence moved mib->cnt_ptr code to ksz common layer
> and removed from individual product files.
>
> Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [RFC Patch net-next v2 8/9] net: dsa: microchip: add the phylink get_caps
From: Florian Fainelli @ 2022-05-17 2:58 UTC (permalink / raw)
To: Arun Ramadoss, linux-kernel, netdev
Cc: Russell King, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
Vivien Didelot, Vladimir Oltean, David S. Miller, Jakub Kicinski,
Paolo Abeni, Oleksij Rempel, Marek Vasut, Michael Grzeschik,
Eric Dumazet
In-Reply-To: <20220513102219.30399-9-arun.ramadoss@microchip.com>
On 5/13/2022 3:22 AM, Arun Ramadoss wrote:
> This patch add the support for phylink_get_caps for ksz8795 and ksz9477
> series switch. It updates the struct ksz_switch_chip with the details of
> the internal phys and xmii interface. Then during the get_caps based on
> the bits set in the structure, corresponding phy mode is set.
>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Many ways to skin a cat^w report what a given port can do to phylink, I
would have probably used a bitmask for each type of interface, but this
works as well.
--
Florian
^ permalink raw reply
* Re: [RFC Patch net-next v2 9/9] net: dsa: microchip: remove unused members in ksz_device
From: Florian Fainelli @ 2022-05-17 2:58 UTC (permalink / raw)
To: Arun Ramadoss, linux-kernel, netdev
Cc: Russell King, Woojung Huh, UNGLinuxDriver, Andrew Lunn,
Vivien Didelot, Vladimir Oltean, David S. Miller, Jakub Kicinski,
Paolo Abeni, Oleksij Rempel, Marek Vasut, Michael Grzeschik,
Eric Dumazet
In-Reply-To: <20220513102219.30399-10-arun.ramadoss@microchip.com>
On 5/13/2022 3:22 AM, Arun Ramadoss wrote:
> The name, regs_size and overrides members in struct ksz_device are
> unused. Hence remove it.
> And host_mask is used in only place of ksz8795.c file, which can be
> replaced by dev->info->cpu_ports
>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH bpf-next] selftests/bpf: fix some bugs in map_lookup_percpu_elem testcase
From: Yonghong Song @ 2022-05-17 3:09 UTC (permalink / raw)
To: Feng zhou, ast, daniel, andrii, kafai, songliubraving,
john.fastabend, kpsingh, rostedt, mingo, jolsa, davemarchevsky,
joannekoong, geliang.tang
Cc: netdev, bpf, linux-kernel, duanxiongchun, songmuchun,
wangdongdong.6, cong.wang, zhouchengming, yosryahmed
In-Reply-To: <20220516022453.68420-1-zhoufeng.zf@bytedance.com>
On 5/15/22 7:24 PM, Feng zhou wrote:
> From: Feng Zhou <zhoufeng.zf@bytedance.com>
>
> comments from Andrii Nakryiko, details in here:
> https://lore.kernel.org/lkml/20220511093854.411-1-zhoufeng.zf@bytedance.com/T/
>
> use /* */ instead of //
> use libbpf_num_possible_cpus() instead of sysconf(_SC_NPROCESSORS_ONLN)
> use 8 bytes for value size
> fix memory leak
> use ASSERT_EQ instead of ASSERT_OK
> add bpf_loop to fetch values on each possible CPU
>
> Fixes: ed7c13776e20c74486b0939a3c1de984c5efb6aa ("selftests/bpf: add test case for bpf_map_lookup_percpu_elem")
> Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
LGTM with a few nits below.
Acked-by: Yonghong Song <yhs@fb.com>
> ---
> .../bpf/prog_tests/map_lookup_percpu_elem.c | 49 +++++++++------
> .../bpf/progs/test_map_lookup_percpu_elem.c | 61 ++++++++++++-------
> 2 files changed, 70 insertions(+), 40 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/map_lookup_percpu_elem.c b/tools/testing/selftests/bpf/prog_tests/map_lookup_percpu_elem.c
> index 58b24c2112b0..89ca170f1c25 100644
> --- a/tools/testing/selftests/bpf/prog_tests/map_lookup_percpu_elem.c
> +++ b/tools/testing/selftests/bpf/prog_tests/map_lookup_percpu_elem.c
> @@ -1,30 +1,39 @@
> -// SPDX-License-Identifier: GPL-2.0
> -// Copyright (c) 2022 Bytedance
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (c) 2022 Bytedance */
>
> #include <test_progs.h>
>
The above empty line is unnecessary.
> #include "test_map_lookup_percpu_elem.skel.h"
>
> -#define TEST_VALUE 1
> -
> void test_map_lookup_percpu_elem(void)
> {
> struct test_map_lookup_percpu_elem *skel;
> - int key = 0, ret;
> - int nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
> - int *buf;
> + __u64 key = 0, sum;
> + int ret, i;
> + int nr_cpus = libbpf_num_possible_cpus();
> + __u64 *buf;
>
> - buf = (int *)malloc(nr_cpus*sizeof(int));
> + buf = (__u64 *)malloc(nr_cpus*sizeof(__u64));
> if (!ASSERT_OK_PTR(buf, "malloc"))
> return;
> - memset(buf, 0, nr_cpus*sizeof(int));
> - buf[0] = TEST_VALUE;
>
> - skel = test_map_lookup_percpu_elem__open_and_load();
> - if (!ASSERT_OK_PTR(skel, "test_map_lookup_percpu_elem__open_and_load"))
> - return;
> + for (i=0; i<nr_cpus; i++)
> + buf[i] = i;
> + sum = (nr_cpus-1)*nr_cpus/2;
> +
> + skel = test_map_lookup_percpu_elem__open();
> + if (!ASSERT_OK_PTR(skel, "test_map_lookup_percpu_elem__open"))
> + goto exit;
> +
> + skel->rodata->nr_cpus = nr_cpus;
> +
> + ret = test_map_lookup_percpu_elem__load(skel);
> + if (!ASSERT_OK(ret, "test_map_lookup_percpu_elem__load"))
> + goto cleanup;
> +
> ret = test_map_lookup_percpu_elem__attach(skel);
> - ASSERT_OK(ret, "test_map_lookup_percpu_elem__attach");
> + if (!ASSERT_OK(ret, "test_map_lookup_percpu_elem__attach"))
> + goto cleanup;
>
> ret = bpf_map_update_elem(bpf_map__fd(skel->maps.percpu_array_map), &key, buf, 0);
> ASSERT_OK(ret, "percpu_array_map update");
> @@ -37,10 +46,14 @@ void test_map_lookup_percpu_elem(void)
>
> syscall(__NR_getuid);
>
> - ret = skel->bss->percpu_array_elem_val == TEST_VALUE &&
> - skel->bss->percpu_hash_elem_val == TEST_VALUE &&
> - skel->bss->percpu_lru_hash_elem_val == TEST_VALUE;
> - ASSERT_OK(!ret, "bpf_map_lookup_percpu_elem success");
> + test_map_lookup_percpu_elem__detach(skel);
> +
> + ASSERT_EQ(skel->bss->percpu_array_elem_sum, sum, "percpu_array lookup percpu elem");
> + ASSERT_EQ(skel->bss->percpu_hash_elem_sum, sum, "percpu_hash lookup percpu elem");
> + ASSERT_EQ(skel->bss->percpu_lru_hash_elem_sum, sum, "percpu_lru_hash lookup percpu elem");
>
> +cleanup:
> test_map_lookup_percpu_elem__destroy(skel);
> +exit:
> + free(buf);
> }
[...]
> +struct read_percpu_elem_ctx {
> + void *map;
> + __u64 sum;
> +};
> +
> +static int read_percpu_elem_callback(__u32 index, struct read_percpu_elem_ctx *ctx)
> +{
> + __u64 key = 0;
> + __u64 *value;
Please add an empty line here.
> + value = bpf_map_lookup_percpu_elem(ctx->map, &key, index);
> + if (value)
> + ctx->sum += *value;
> + return 0;
> +}
> +
[...]
^ permalink raw reply
* Re: [PATCH net-next] net: wwan: t7xx: fix GFP_KERNEL usage in spin_lock context
From: Ziyang Xuan (William) @ 2022-05-17 3:12 UTC (permalink / raw)
To: Martinez, Ricardo, Sergey Ryazanov
Cc: Devegowda, Chandrashekar, Intel Corporation, chiranjeevi.rapolu,
Haijun Liu (刘海军), M Chetan Kumar,
Loic Poulain, David Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, open list
In-Reply-To: <0b90f4f6-6911-017b-6d37-50354003900e@linux.intel.com>
>
> On 5/16/2022 1:36 PM, Sergey Ryazanov wrote:
>> Hello Ziyang,
>>
>> On Sat, May 14, 2022 at 11:57 AM Ziyang Xuan
>> <william.xuanziyang@huawei.com> wrote:
>>> t7xx_cldma_clear_rxq() call t7xx_cldma_alloc_and_map_skb() in spin_lock
>>> context, But __dev_alloc_skb() in t7xx_cldma_alloc_and_map_skb() uses
>>> GFP_KERNEL, that will introduce scheduling factor in spin_lock context.
>>>
>>> Replace GFP_KERNEL with GFP_ATOMIC to fix it.
>> Would not it will be more reliable to just rework
>> t7xx_cldma_clear_rxq() to avoid calling t7xx_cldma_alloc_and_map_skb()
>> under the spin lock instead of doing each allocation with GFP_ATOMIC?
>> E.g. t7xx_cldma_gpd_rx_from_q() calls t7xx_cldma_alloc_and_map_skb()
>> avoiding any lock holding.
>
> t7xx_cldma_clear_rxq() is a helper for t7xx_cldma_clear_all_qs() which is only called by t7xx_cldma_exception() after stopping CLDMA, so it should be OK to remove the spin lock from t7xx_cldma_clear_rxq().
>
OK, I see. Thus we can remove spink_lock and annotate it.
>
> .
^ permalink raw reply
* Re: [PATCH net-next] selftests: fib_nexthops: Make the test more robust
From: Shuah Khan @ 2022-05-17 3:15 UTC (permalink / raw)
To: Amit Cohen, netdev@vger.kernel.org
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, shuah@kernel.org, mlxsw, dsahern@kernel.org,
Shuah Khan
In-Reply-To: <DM6PR12MB3066DCC2C0841FE01E081A27CBCC9@DM6PR12MB3066.namprd12.prod.outlook.com>
On 5/15/22 8:26 AM, Amit Cohen wrote:
>
>
>> -----Original Message-----
>> From: Shuah Khan <skhan@linuxfoundation.org>
>> Sent: Thursday, May 12, 2022 7:40 PM
>> To: Amit Cohen <amcohen@nvidia.com>; netdev@vger.kernel.org
>> Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; shuah@kernel.org; mlxsw
>> <mlxsw@nvidia.com>; dsahern@kernel.org; Shuah Khan <skhan@linuxfoundation.org>
>> Subject: Re: [PATCH net-next] selftests: fib_nexthops: Make the test more robust
>>
>> On 5/12/22 9:17 AM, Amit Cohen wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Shuah Khan <skhan@linuxfoundation.org>
>>>> Sent: Thursday, May 12, 2022 5:28 PM
>>>> To: Amit Cohen <amcohen@nvidia.com>; netdev@vger.kernel.org
>>>> Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; shuah@kernel.org; mlxsw
>>>> <mlxsw@nvidia.com>; Shuah Khan <skhan@linuxfoundation.org>
>>>> Subject: Re: [PATCH net-next] selftests: fib_nexthops: Make the test more robust
>>>>
>>>> On 5/12/22 7:12 AM, Amit Cohen wrote:
>>>>> Rarely some of the test cases fail. Make the test more robust by increasing
>>>>> the timeout of ping commands to 5 seconds.
>>>>>
>>>>
>>>> Can you explain why test cases fail?
>>>
>>> The failures are probably caused due to slow forwarding performance.
>>> You can see similar commit - b6a4fd680042 ("selftests: forwarding: Make ping timeout configurable").
>>>
>>
>> My primary concern is that this patch simply changes the value
>> and doesn't make it configurable. Sounds like the above commit
>> does that. Why not use the same approach to keep it consistent.
>>
>>>>
>>>>> Signed-off-by: Amit Cohen <amcohen@nvidia.com>
>>>>> ---
>>>>> tools/testing/selftests/net/fib_nexthops.sh | 48 ++++++++++-----------
>>>>> 1 file changed, 24 insertions(+), 24 deletions(-)
>>>>>
>>>>> diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
>>>>> index b3bf5319bb0e..a99ee3fb2e13 100755
>>>>> --- a/tools/testing/selftests/net/fib_nexthops.sh
>>>>> +++ b/tools/testing/selftests/net/fib_nexthops.sh
>>>>> @@ -882,13 +882,13 @@ ipv6_fcnal_runtime()
>>>>> log_test $? 0 "Route delete"
>>>>>
>>>>> run_cmd "$IP ro add 2001:db8:101::1/128 nhid 81"
>>>>> - run_cmd "ip netns exec me ping -c1 -w1 2001:db8:101::1"
>>>>> + run_cmd "ip netns exec me ping -c1 -w5 2001:db8:101::1"
>>>>> log_test $? 0 "Ping with nexthop"
>>>>>
>>>> Looks like the change uses "-w deadline" - "-W timeout" might
>>>> be a better choice if ping fails with no response?
>>>
>>> We usually use "-w" in ping commands in selftests, but I can change it if you prefer "-W".
>>>
>>
>> I will defer to networking experts/maintainers on the choice of
>> -w vs -W
>>
>> I would like to see the timeout configurable.
>
> If you feel strongly about that, I can send another patch to make it configurable.
> Personally, I don't think that someone will use it, and from our experience, 5 seconds seems to be enough.
>
Please make this configurable. It will be consistent with other tests.
thanks,
-- Shuah
^ permalink raw reply
* Re: [PATCHv3 bpf-next] selftests/bpf: add missed ima_setup.sh in Makefile
From: Yonghong Song @ 2022-05-17 3:15 UTC (permalink / raw)
To: Hangbin Liu, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
Jakub Kicinski, David S . Miller, Mathieu Xhonneux, Lorenz Bauer,
William Tu, Toshiaki Makita, Jesper Dangaard Brouer
In-Reply-To: <20220516040020.653291-1-liuhangbin@gmail.com>
On 5/15/22 9:00 PM, Hangbin Liu wrote:
> When build bpf test and install it to another folder, e.g.
>
> make -j10 install -C tools/testing/selftests/ TARGETS="bpf" \
> SKIP_TARGETS="" INSTALL_PATH=/tmp/kselftests
>
> The ima_setup.sh is missed in target folder, which makes test_ima failed.
>
> Fix it by adding ima_setup.sh to TEST_PROGS_EXTENDED.
>
> Fixes: 34b82d3ac105 ("bpf: Add a selftest for bpf_ima_inode_hash")
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
^ permalink raw reply
* Re: [PATCH RESEND net] bonding: fix missed rcu protection
From: Hangbin Liu @ 2022-05-17 3:42 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
David S . Miller, David Ahern, Jonathan Toppins, Eric Dumazet,
Paolo Abeni, syzbot+92beb3d46aab498710fa, Vladimir Oltean
In-Reply-To: <20220516181028.7dbbf918@kernel.org>
On Mon, May 16, 2022 at 06:10:28PM -0700, Jakub Kicinski wrote:
> Can't ->get_ts_info sleep now? It'd be a little sad to force it
> to be atomic just because of one upper dev trying to be fancy.
> Maybe all we need to do is to take a ref on the real_dev?
Do you mean
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 38e152548126..b60450211579 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5591,16 +5591,20 @@ static int bond_ethtool_get_ts_info(struct net_device *bond_dev,
const struct ethtool_ops *ops;
struct net_device *real_dev;
struct phy_device *phydev;
+ int ret = 0;
real_dev = bond_option_active_slave_get_rcu(bond);
if (real_dev) {
+ dev_hold(real_dev)
ops = real_dev->ethtool_ops;
phydev = real_dev->phydev;
if (phy_has_tsinfo(phydev)) {
- return phy_ts_info(phydev, info);
+ ret = phy_ts_info(phydev, info);
+ goto out;
} else if (ops->get_ts_info) {
- return ops->get_ts_info(real_dev, info);
+ ret = ops->get_ts_info(real_dev, info);
+ goto out;
}
}
@@ -5608,7 +5612,10 @@ static int bond_ethtool_get_ts_info(struct net_device *bond_dev,
SOF_TIMESTAMPING_SOFTWARE;
info->phc_index = -1;
- return 0;
+out:
+ if (real_dev)
+ dev_put(real_dev);
+ return ret;
}
This look OK to me.
Vladimir, Jay, WDYT?
>
> Also please add a Link: to the previous discussion, it'd have been
> useful to get the context in which Vladimir suggested this.
OK, I will.
Thanks
Hangbin
^ permalink raw reply related
* [PATCH] bpf: add access control for map
From: menglong8.dong @ 2022-05-17 3:41 UTC (permalink / raw)
To: ast; +Cc: netdev, bpf, linux-kernel, Menglong Dong
From: Menglong Dong <imagedong@tencent.com>
Hello,
I have a idea about the access control of eBPF map, could you help
to see if it works?
For now, only users with the CAP_SYS_ADMIN or CAP_BPF have the right
to access the data in eBPF maps. So I'm thinking, are there any way
to control the access to the maps, just like what we do to files?
Therefore, we can decide who have the right to read the map and who
can write.
I think it is useful in some case. For example, I made a eBPF-based
network statistics program, and the information is stored in an array
map. And I want all users can read the information in the map, without
changing the capacity of them. As the information is iunsensitive,
normal users can read it. This make publish-consume mode possible,
the eBPF program is publisher and the user space program is consumer.
So this aim can be achieve, if we can control the access of maps as a
file. There are many ways I thought, and I choosed one to implement:
While pining the map, add the inode that is created to a list on the
map. root can change the permission of the inode through the pin path.
Therefore, we can try to find the inode corresponding to current user
namespace in the list, and check whether user have permission to read
or write.
The steps can be:
1. create the map with BPF_F_UMODE flags, which imply that enable
access control in this map.
2. load and pin the map on /sys/fs/bpf/xxx.
3. change the umode of /sys/fs/bpf/xxx with 'chmod 744 /sys/fs/bpf/xxx',
therefor all user can read the map.
I'm not sure if there is already way to achieve this aim, this is just
a idea and the code is totally not ok (it will panic when unpin the map,
seems the usage of my RCU lock is wrong)
Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
include/linux/bpf.h | 7 ++++
include/uapi/linux/bpf.h | 1 +
kernel/bpf/arraymap.c | 2 +-
kernel/bpf/inode.c | 60 +++++++++++++++++++++++++++++++---
kernel/bpf/syscall.c | 69 +++++++++++++++++++++++++++++++++++++---
5 files changed, 130 insertions(+), 9 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index be94833d390a..34cc4f99df49 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -190,6 +190,11 @@ struct bpf_map_off_arr {
u8 field_sz[BPF_MAP_OFF_ARR_MAX];
};
+struct bpf_map_inode {
+ struct list_head list;
+ struct inode *inode;
+};
+
struct bpf_map {
/* The first two cachelines with read-mostly members of which some
* are also accessed in fast-path (e.g. ops, max_entries).
@@ -205,6 +210,7 @@ struct bpf_map {
u32 max_entries;
u64 map_extra; /* any per-map-type extra fields */
u32 map_flags;
+ struct list_head inode_list;
int spin_lock_off; /* >=0 valid offset, <0 error */
struct bpf_map_value_off *kptr_off_tab;
int timer_off; /* >=0 valid offset, <0 error */
@@ -2345,5 +2351,6 @@ bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
u32 **bin_buf, u32 num_args);
void bpf_bprintf_cleanup(void);
+int bpf_map_permission(struct bpf_map *map, int flags);
#endif /* _LINUX_BPF_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 444fe6f1cf35..f5a47ca486d8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1225,6 +1225,7 @@ enum {
/* Create a map that is suitable to be an inner map with dynamic max entries */
BPF_F_INNER_MAP = (1U << 12),
+ BPF_F_UMODE = (1U << 13),
};
/* Flags for BPF_PROG_QUERY. */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index b3bf31fd9458..9e00634070b0 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -17,7 +17,7 @@
#define ARRAY_CREATE_FLAG_MASK \
(BPF_F_NUMA_NODE | BPF_F_MMAPABLE | BPF_F_ACCESS_MASK | \
- BPF_F_PRESERVE_ELEMS | BPF_F_INNER_MAP)
+ BPF_F_PRESERVE_ELEMS | BPF_F_INNER_MAP | BPF_F_UMODE)
static void bpf_array_free_percpu(struct bpf_array *array)
{
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 4f841e16779e..bfe3507fdefd 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -334,6 +334,8 @@ static int bpf_mkobj_ops(struct dentry *dentry, umode_t mode, void *raw,
{
struct inode *dir = dentry->d_parent->d_inode;
struct inode *inode = bpf_get_inode(dir->i_sb, dir, mode);
+ struct bpf_map_inode *map_inode;
+ struct bpf_map *map;
if (IS_ERR(inode))
return PTR_ERR(inode);
@@ -341,6 +343,19 @@ static int bpf_mkobj_ops(struct dentry *dentry, umode_t mode, void *raw,
inode->i_fop = fops;
inode->i_private = raw;
+ if (iops != &bpf_map_iops)
+ goto out;
+
+ map = raw;
+ map_inode = kmalloc(sizeof(*map_inode), GFP_KERNEL);
+ if (!map_inode) {
+ free_inode_nonrcu(inode);
+ return -ENOMEM;
+ }
+ map_inode->inode = inode;
+ list_add_rcu(&map_inode->list, &map->inode_list);
+
+out:
bpf_dentry_finalize(dentry, inode, dir);
return 0;
}
@@ -542,14 +557,26 @@ int bpf_obj_get_user(const char __user *pathname, int flags)
if (IS_ERR(raw))
return PTR_ERR(raw);
- if (type == BPF_TYPE_PROG)
+ if (type != BPF_TYPE_MAP && !bpf_capable())
+ return -EPERM;
+
+ switch (type) {
+ case BPF_TYPE_PROG:
ret = bpf_prog_new_fd(raw);
- else if (type == BPF_TYPE_MAP)
+ break;
+ case BPF_TYPE_MAP:
+ if (bpf_map_permission(raw, f_flags)) {
+ bpf_any_put(raw, type);
+ return -EPERM;
+ }
ret = bpf_map_new_fd(raw, f_flags);
- else if (type == BPF_TYPE_LINK)
+ break;
+ case BPF_TYPE_LINK:
ret = (f_flags != O_RDWR) ? -EINVAL : bpf_link_new_fd(raw);
- else
+ break;
+ default:
return -ENOENT;
+ }
if (ret < 0)
bpf_any_put(raw, type);
@@ -610,6 +637,27 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
return 0;
}
+static void bpf_map_inode_remove(struct bpf_map *map, struct inode *inode)
+{
+ struct bpf_map_inode *map_inode;
+
+ if (!(map->map_flags & BPF_F_UMODE))
+ return;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(map_inode, &map->inode_list, list) {
+ if (map_inode->inode == inode)
+ goto found;
+ }
+ rcu_read_unlock();
+ return;
+found:
+ rcu_read_unlock();
+ list_del_rcu(&map_inode->list);
+ synchronize_rcu();
+ kfree(map_inode);
+}
+
static void bpf_free_inode(struct inode *inode)
{
enum bpf_type type;
@@ -618,6 +666,10 @@ static void bpf_free_inode(struct inode *inode)
kfree(inode->i_link);
if (!bpf_inode_type(inode, &type))
bpf_any_put(inode->i_private, type);
+
+ if (type == BPF_TYPE_MAP)
+ bpf_map_inode_remove(inode->i_private, inode);
+
free_inode_nonrcu(inode);
}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e0aead17dff4..1fd9b22e95ff 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -586,6 +586,16 @@ void bpf_map_free_kptrs(struct bpf_map *map, void *map_value)
}
}
+static void bpf_map_inode_release(struct bpf_map *map)
+{
+ struct bpf_map_inode *cur, *prev;
+
+ list_for_each_entry_safe(cur, prev, &map->inode_list, list) {
+ list_del(&cur->list);
+ kfree(cur);
+ }
+}
+
/* called from workqueue */
static void bpf_map_free_deferred(struct work_struct *work)
{
@@ -594,6 +604,8 @@ static void bpf_map_free_deferred(struct work_struct *work)
security_bpf_map_free(map);
kfree(map->off_arr);
bpf_map_release_memcg(map);
+ bpf_map_inode_release(map);
+
/* implementation dependent freeing, map_free callback also does
* bpf_map_free_kptr_off_tab, if needed.
*/
@@ -1092,6 +1104,7 @@ static int map_create(union bpf_attr *attr)
atomic64_set(&map->usercnt, 1);
mutex_init(&map->freeze_mutex);
spin_lock_init(&map->owner.lock);
+ INIT_LIST_HEAD(&map->inode_list);
map->spin_lock_off = -EINVAL;
map->timer_off = -EINVAL;
@@ -3707,6 +3720,30 @@ static int bpf_prog_get_fd_by_id(const union bpf_attr *attr)
return fd;
}
+int bpf_map_permission(struct bpf_map *map, int flags)
+{
+ struct bpf_map_inode *map_inode;
+ struct user_namespace *ns;
+
+ if (capable(CAP_SYS_ADMIN))
+ return 0;
+
+ if (!(map->map_flags & BPF_F_UMODE))
+ return -1;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(map_inode, &map->inode_list, list) {
+ ns = map_inode->inode->i_sb->s_user_ns;
+ if (ns == current_user_ns())
+ goto found;
+ }
+ rcu_read_unlock();
+ return -1;
+found:
+ rcu_read_unlock();
+ return inode_permission(ns, map_inode->inode, ACC_MODE(flags));
+}
+
#define BPF_MAP_GET_FD_BY_ID_LAST_FIELD open_flags
static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
@@ -3720,9 +3757,6 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
attr->open_flags & ~BPF_OBJ_FLAG_MASK)
return -EINVAL;
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
-
f_flags = bpf_get_file_flag(attr->open_flags);
if (f_flags < 0)
return f_flags;
@@ -3738,6 +3772,11 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
if (IS_ERR(map))
return PTR_ERR(map);
+ if (bpf_map_permission(map, f_flags)) {
+ bpf_map_put_with_uref(map);
+ return -EPERM;
+ }
+
fd = bpf_map_new_fd(map, f_flags);
if (fd < 0)
bpf_map_put_with_uref(map);
@@ -4844,12 +4883,34 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
return ret;
}
+static inline bool is_map_ops_cmd(int cmd)
+{
+ switch (cmd) {
+ case BPF_MAP_LOOKUP_ELEM:
+ case BPF_MAP_UPDATE_ELEM:
+ case BPF_MAP_DELETE_ELEM:
+ case BPF_MAP_GET_NEXT_KEY:
+ case BPF_MAP_FREEZE:
+ case BPF_OBJ_GET:
+ case BPF_MAP_LOOKUP_AND_DELETE_ELEM:
+ case BPF_MAP_LOOKUP_BATCH:
+ case BPF_MAP_LOOKUP_AND_DELETE_BATCH:
+ case BPF_MAP_UPDATE_BATCH:
+ case BPF_MAP_DELETE_BATCH:
+ case BPF_OBJ_GET_INFO_BY_FD:
+ return true;
+ default:
+ return false;
+ }
+}
+
static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
{
union bpf_attr attr;
int err;
- if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
+ if (sysctl_unprivileged_bpf_disabled && !bpf_capable() &&
+ !is_map_ops_cmd(cmd))
return -EPERM;
err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
--
2.36.1
^ permalink raw reply related
* Re: [PATCH v3 3/4] can: skb:: move can_dropped_invalid_skb and can_skb_headroom_valid to skb.c
From: Max Staudt @ 2022-05-17 4:12 UTC (permalink / raw)
To: Vincent MAILHOL
Cc: Oliver Hartkopp, Marc Kleine-Budde, linux-can, linux-kernel,
netdev
In-Reply-To: <CAMZ6RqKZMHXB7rQ70GrXcVE7x7kytAAGfE+MOpSgWgWgp0gD2g@mail.gmail.com>
On Tue, 17 May 2022 10:50:16 +0900
Vincent MAILHOL <mailhol.vincent@wanadoo.fr> wrote:
> My concern is: what would be the merrit? If we do not split, the users
> of slcan and v(x)can would have to load the can-dev module which will
> be slightly bloated for their use, but is this really an issue? I do
> not see how this can become a performance bottleneck, so what is the
> problem?
> I could also argue that most of the devices do not depend on
> rx-offload.o. So should we also split this one out of can-dev on the
> same basis and add another module dependency?
> The benefit (not having to load a bloated module for three drivers)
> does not outweigh the added complexity: all hardware modules will have
> one additional modprobe dependency on the tiny can-skb module.
>
> But as said above, I am not fully opposed to the split, I am just
> strongly divided. If we go for the split, creating a can-skb module is
> the natural and only option I see.
> If the above argument does not convince you, I will send a v3 with
> that split.
I originally replied to PATCHv2 in agreement with what Vincent said in
the first part - I agree that simply moving this code into can-dev and
making v(x)can/slcan depend on it is the straightforward thing to do.
Having yet another kernel module for a tiny bit of code adds more
unnecessary complexity IMHO. And from a user perspective, it seems
fairly natural to have can-dev loaded any time some can0, slcan0,
vcan0, etc. pops up.
Finally, embedded applications that are truly short on memory are
likely using a "real" CAN adapter, and hence already have can-dev
loaded anyway.
I think it's okay to just move it to can-dev and call it a day :)
Max
^ permalink raw reply
* Re: [PATCH net-next] net: ifdefy the wireless pointers in struct net_device
From: Kalle Valo @ 2022-05-17 4:36 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, johannes, alex.aring, stefan,
mareklindner, sw, a, sven, linux-wireless, linux-wpan
In-Reply-To: <20220516215638.1787257-1-kuba@kernel.org>
Jakub Kicinski <kuba@kernel.org> writes:
> Most protocol-specific pointers in struct net_device are under
> a respective ifdef. Wireless is the notable exception. Since
> there's a sizable number of custom-built kernels for datacenter
> workloads which don't build wireless it seems reasonable to
> ifdefy those pointers as well.
>
> While at it move IPv4 and IPv6 pointers up, those are special
> for obvious reasons.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: johannes@sipsolutions.net
> CC: alex.aring@gmail.com
> CC: stefan@datenfreihafen.org
> CC: mareklindner@neomailbox.ch
> CC: sw@simonwunderlich.de
> CC: a@unstable.cc
> CC: sven@narfation.org
> CC: linux-wireless@vger.kernel.org
> CC: linux-wpan@vger.kernel.org
[...]
> --- a/include/net/cfg80211.h
> +++ b/include/net/cfg80211.h
> @@ -8004,10 +8004,7 @@ int cfg80211_register_netdevice(struct net_device *dev);
> *
> * Requires the RTNL and wiphy mutex to be held.
> */
> -static inline void cfg80211_unregister_netdevice(struct net_device *dev)
> -{
> - cfg80211_unregister_wdev(dev->ieee80211_ptr);
> -}
> +void cfg80211_unregister_netdevice(struct net_device *dev);
>
> /**
> * struct cfg80211_ft_event_params - FT Information Elements
[...]
> --- a/net/wireless/core.c
> +++ b/net/wireless/core.c
> @@ -1374,6 +1374,12 @@ int cfg80211_register_netdevice(struct net_device *dev)
> }
> EXPORT_SYMBOL(cfg80211_register_netdevice);
>
> +void cfg80211_unregister_netdevice(struct net_device *dev)
> +{
> + cfg80211_unregister_wdev(dev->ieee80211_ptr);
> +}
> +EXPORT_SYMBOL(cfg80211_unregister_netdevice);
Why moving this to a proper function? Just curious, I couldn't figure it
out.
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* RE: [EXT] Re: [net-next PATCH V2] octeontx2-pf: Add support for adaptive interrupt coalescing
From: Suman Ghosh @ 2022-05-17 4:39 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
Sunil Kovvuri Goutham, Subbaraya Sundeep Bhatta,
Geethasowjanya Akula, Sunil.Goutham@cavium.com, Hariprasad Kelam,
colin.king@intel.com, netdev@vger.kernel.org
In-Reply-To: <20220516112817.4f7d99cf@kernel.org>
>> @@ -1230,7 +1262,7 @@ static int otx2_set_link_ksettings(struct
>> net_device *netdev,
>>
>> static const struct ethtool_ops otx2_ethtool_ops = {
>> .supported_coalesce_params = ETHTOOL_COALESCE_USECS |
>> - ETHTOOL_COALESCE_MAX_FRAMES,
>> + ETHTOOL_COALESCE_MAX_FRAMES |
>ETHTOOL_COALESCE_USE_ADAPTIVE,
>
>Ah there it is. Now you actually tested it with upstream :/
>
>Please wrap the line.
[Suman] Done.
>
>Also please reject user trying to set asymmetric adaptive mode since you
>don't seem to support it:
[Suman] Done
>
>> + /* Check and update coalesce status */
>> + if ((pfvf->flags & OTX2_FLAG_ADPTV_INT_COAL_ENABLED) ==
>> + OTX2_FLAG_ADPTV_INT_COAL_ENABLED) {
>> + priv_coalesce_status = 1;
>> + if (!ec->use_adaptive_rx_coalesce || !ec-
>>use_adaptive_tx_coalesce)
>> + pfvf->flags &= ~OTX2_FLAG_ADPTV_INT_COAL_ENABLED;
>> + } else {
>> + priv_coalesce_status = 0;
>> + if (ec->use_adaptive_rx_coalesce || ec-
>>use_adaptive_tx_coalesce)
>> + pfvf->flags |= OTX2_FLAG_ADPTV_INT_COAL_ENABLED;
>
>If I'm reading this right user doing:
>
>ethtool -C eth0 adaptive-rx on adaptive-tx off
>
>multiple times will keep switching between adaptive and non-adaptive
>mode. This will be super confusing to automation which may assume
>configuration commands are idempotent.
[Suman] I see your point now. Updated the code to make sure adaptive-rx is same as adaptive-tx.
^ permalink raw reply
* [net-next PATCH V3] octeontx2-pf: Add support for adaptive interrupt coalescing
From: Suman Ghosh @ 2022-05-17 4:40 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, sgoutham, sbhatta, gakula,
Sunil.Goutham, hkelam, colin.king, netdev
Cc: Suman Ghosh
Added support for adaptive IRQ coalescing. It uses net_dim
algorithm to find the suitable delay/IRQ count based on the
current packet rate.
Signed-off-by: Suman Ghosh <sumang@marvell.com>
---
Changes since V2
- Addressed review comments.
.../net/ethernet/marvell/octeontx2/Kconfig | 1 +
.../marvell/octeontx2/nic/otx2_common.c | 5 ---
.../marvell/octeontx2/nic/otx2_common.h | 10 +++++
.../marvell/octeontx2/nic/otx2_ethtool.c | 45 +++++++++++++++++--
.../ethernet/marvell/octeontx2/nic/otx2_pf.c | 22 +++++++++
.../marvell/octeontx2/nic/otx2_txrx.c | 23 ++++++++++
.../marvell/octeontx2/nic/otx2_txrx.h | 1 +
7 files changed, 99 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/Kconfig b/drivers/net/ethernet/marvell/octeontx2/Kconfig
index 639893d87055..e1036b0eb6b1 100644
--- a/drivers/net/ethernet/marvell/octeontx2/Kconfig
+++ b/drivers/net/ethernet/marvell/octeontx2/Kconfig
@@ -33,6 +33,7 @@ config OCTEONTX2_PF
select OCTEONTX2_MBOX
select NET_DEVLINK
depends on (64BIT && COMPILE_TEST) || ARM64
+ select DIMLIB
depends on PCI
depends on PTP_1588_CLOCK_OPTIONAL
help
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index b9d7601138ca..fb8db5888d2f 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -97,11 +97,6 @@ void otx2_get_dev_stats(struct otx2_nic *pfvf)
{
struct otx2_dev_stats *dev_stats = &pfvf->hw.dev_stats;
-#define OTX2_GET_RX_STATS(reg) \
- otx2_read64(pfvf, NIX_LF_RX_STATX(reg))
-#define OTX2_GET_TX_STATS(reg) \
- otx2_read64(pfvf, NIX_LF_TX_STATX(reg))
-
dev_stats->rx_bytes = OTX2_GET_RX_STATS(RX_OCTS);
dev_stats->rx_drops = OTX2_GET_RX_STATS(RX_DROP);
dev_stats->rx_bcast_frames = OTX2_GET_RX_STATS(RX_BCAST);
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
index c587c14ac2a3..ce2766317c0b 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
@@ -18,6 +18,7 @@
#include <net/pkt_cls.h>
#include <net/devlink.h>
#include <linux/time64.h>
+#include <linux/dim.h>
#include <mbox.h>
#include <npc.h>
@@ -54,6 +55,11 @@ enum arua_mapped_qtypes {
/* Send skid of 2000 packets required for CQ size of 4K CQEs. */
#define SEND_CQ_SKID 2000
+#define OTX2_GET_RX_STATS(reg) \
+ otx2_read64(pfvf, NIX_LF_RX_STATX(reg))
+#define OTX2_GET_TX_STATS(reg) \
+ otx2_read64(pfvf, NIX_LF_TX_STATX(reg))
+
struct otx2_lmt_info {
u64 lmt_addr;
u16 lmt_id;
@@ -351,6 +357,7 @@ struct otx2_nic {
#define OTX2_FLAG_TC_MATCHALL_EGRESS_ENABLED BIT_ULL(12)
#define OTX2_FLAG_TC_MATCHALL_INGRESS_ENABLED BIT_ULL(13)
#define OTX2_FLAG_DMACFLTR_SUPPORT BIT_ULL(14)
+#define OTX2_FLAG_ADPTV_INT_COAL_ENABLED BIT_ULL(16)
u64 flags;
u64 *cq_op_addr;
@@ -408,6 +415,9 @@ struct otx2_nic {
u8 pfc_en;
u8 *queue_to_pfc_map;
#endif
+
+ /* napi event count. It is needed for adaptive irq coalescing. */
+ u32 napi_events;
};
static inline bool is_otx2_lbkvf(struct pci_dev *pdev)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
index fc328de5345e..bc614a4def9e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
@@ -455,6 +455,14 @@ static int otx2_get_coalesce(struct net_device *netdev,
cmd->rx_max_coalesced_frames = hw->cq_ecount_wait;
cmd->tx_coalesce_usecs = hw->cq_time_wait;
cmd->tx_max_coalesced_frames = hw->cq_ecount_wait;
+ if ((pfvf->flags & OTX2_FLAG_ADPTV_INT_COAL_ENABLED) ==
+ OTX2_FLAG_ADPTV_INT_COAL_ENABLED) {
+ cmd->use_adaptive_rx_coalesce = 1;
+ cmd->use_adaptive_tx_coalesce = 1;
+ } else {
+ cmd->use_adaptive_rx_coalesce = 0;
+ cmd->use_adaptive_tx_coalesce = 0;
+ }
return 0;
}
@@ -466,11 +474,30 @@ static int otx2_set_coalesce(struct net_device *netdev,
{
struct otx2_nic *pfvf = netdev_priv(netdev);
struct otx2_hw *hw = &pfvf->hw;
+ u8 priv_coalesce_status;
int qidx;
if (!ec->rx_max_coalesced_frames || !ec->tx_max_coalesced_frames)
return 0;
+ if (ec->use_adaptive_rx_coalesce != ec->use_adaptive_tx_coalesce) {
+ netdev_err(netdev,
+ "adaptive-rx should be same as adaptive-tx");
+ return -EINVAL;
+ }
+
+ /* Check and update coalesce status */
+ if ((pfvf->flags & OTX2_FLAG_ADPTV_INT_COAL_ENABLED) ==
+ OTX2_FLAG_ADPTV_INT_COAL_ENABLED) {
+ priv_coalesce_status = 1;
+ if (!ec->use_adaptive_rx_coalesce)
+ pfvf->flags &= ~OTX2_FLAG_ADPTV_INT_COAL_ENABLED;
+ } else {
+ priv_coalesce_status = 0;
+ if (ec->use_adaptive_rx_coalesce)
+ pfvf->flags |= OTX2_FLAG_ADPTV_INT_COAL_ENABLED;
+ }
+
/* 'cq_time_wait' is 8bit and is in multiple of 100ns,
* so clamp the user given value to the range of 1 to 25usec.
*/
@@ -494,9 +521,9 @@ static int otx2_set_coalesce(struct net_device *netdev,
* so clamp the user given value to the range of 1 to 64k.
*/
ec->rx_max_coalesced_frames = clamp_t(u32, ec->rx_max_coalesced_frames,
- 1, U16_MAX);
+ 1, NAPI_POLL_WEIGHT);
ec->tx_max_coalesced_frames = clamp_t(u32, ec->tx_max_coalesced_frames,
- 1, U16_MAX);
+ 1, NAPI_POLL_WEIGHT);
/* Rx and Tx are mapped to same CQ, check which one
* is changed, if both then choose the min.
@@ -509,6 +536,17 @@ static int otx2_set_coalesce(struct net_device *netdev,
hw->cq_ecount_wait = min_t(u16, ec->rx_max_coalesced_frames,
ec->tx_max_coalesced_frames);
+ /* Reset 'cq_time_wait' and 'cq_ecount_wait' to
+ * default values if coalesce status changed from
+ * 'on' to 'off'.
+ */
+ if (priv_coalesce_status &&
+ ((pfvf->flags & OTX2_FLAG_ADPTV_INT_COAL_ENABLED) !=
+ OTX2_FLAG_ADPTV_INT_COAL_ENABLED)) {
+ hw->cq_time_wait = CQ_TIMER_THRESH_DEFAULT;
+ hw->cq_ecount_wait = CQ_CQE_THRESH_DEFAULT;
+ }
+
if (netif_running(netdev)) {
for (qidx = 0; qidx < pfvf->hw.cint_cnt; qidx++)
otx2_config_irq_coalescing(pfvf, qidx);
@@ -1230,7 +1268,8 @@ static int otx2_set_link_ksettings(struct net_device *netdev,
static const struct ethtool_ops otx2_ethtool_ops = {
.supported_coalesce_params = ETHTOOL_COALESCE_USECS |
- ETHTOOL_COALESCE_MAX_FRAMES,
+ ETHTOOL_COALESCE_MAX_FRAMES |
+ ETHTOOL_COALESCE_USE_ADAPTIVE,
.supported_ring_params = ETHTOOL_RING_USE_RX_BUF_LEN |
ETHTOOL_RING_USE_CQE_SIZE,
.get_link = otx2_get_link,
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index 53b2706d65a1..a7e919b81a2e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -1254,6 +1254,7 @@ static irqreturn_t otx2_cq_intr_handler(int irq, void *cq_irq)
otx2_write64(pf, NIX_LF_CINTX_ENA_W1C(qidx), BIT_ULL(0));
/* Schedule NAPI */
+ pf->napi_events++;
napi_schedule_irqoff(&cq_poll->napi);
return IRQ_HANDLED;
@@ -1267,6 +1268,7 @@ static void otx2_disable_napi(struct otx2_nic *pf)
for (qidx = 0; qidx < pf->hw.cint_cnt; qidx++) {
cq_poll = &qset->napi[qidx];
+ cancel_work_sync(&cq_poll->dim.work);
napi_disable(&cq_poll->napi);
netif_napi_del(&cq_poll->napi);
}
@@ -1546,6 +1548,24 @@ static void otx2_do_set_rx_mode(struct otx2_nic *pf)
mutex_unlock(&pf->mbox.lock);
}
+static void otx2_dim_work(struct work_struct *w)
+{
+ struct dim_cq_moder cur_moder;
+ struct otx2_cq_poll *cq_poll;
+ struct otx2_nic *pfvf;
+ struct dim *dim;
+
+ dim = container_of(w, struct dim, work);
+ cur_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
+ cq_poll = container_of(dim, struct otx2_cq_poll, dim);
+ pfvf = (struct otx2_nic *)cq_poll->dev;
+ pfvf->hw.cq_time_wait = (cur_moder.usec > CQ_TIMER_THRESH_MAX) ?
+ CQ_TIMER_THRESH_MAX : cur_moder.usec;
+ pfvf->hw.cq_ecount_wait = (cur_moder.pkts > NAPI_POLL_WEIGHT) ?
+ NAPI_POLL_WEIGHT : cur_moder.pkts;
+ dim->state = DIM_START_MEASURE;
+}
+
int otx2_open(struct net_device *netdev)
{
struct otx2_nic *pf = netdev_priv(netdev);
@@ -1612,6 +1632,8 @@ int otx2_open(struct net_device *netdev)
cq_poll->cq_ids[CQ_XDP] = CINT_INVALID_CQ;
cq_poll->dev = (void *)pf;
+ cq_poll->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_CQE;
+ INIT_WORK(&cq_poll->dim.work, otx2_dim_work);
netif_napi_add(netdev, &cq_poll->napi,
otx2_napi_handler, NAPI_POLL_WEIGHT);
napi_enable(&cq_poll->napi);
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
index c26de15b2ac3..3baeafc40807 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
@@ -484,6 +484,18 @@ static int otx2_tx_napi_handler(struct otx2_nic *pfvf,
return 0;
}
+static void otx2_adjust_adaptive_coalese(struct otx2_nic *pfvf, struct otx2_cq_poll *cq_poll)
+{
+ struct dim_sample dim_sample;
+ u64 rx_frames, rx_bytes;
+
+ rx_frames = OTX2_GET_RX_STATS(RX_BCAST) + OTX2_GET_RX_STATS(RX_MCAST) +
+ OTX2_GET_RX_STATS(RX_UCAST);
+ rx_bytes = OTX2_GET_RX_STATS(RX_OCTS);
+ dim_update_sample(pfvf->napi_events, rx_frames, rx_bytes, &dim_sample);
+ net_dim(&cq_poll->dim, dim_sample);
+}
+
int otx2_napi_handler(struct napi_struct *napi, int budget)
{
struct otx2_cq_queue *rx_cq = NULL;
@@ -521,6 +533,17 @@ int otx2_napi_handler(struct napi_struct *napi, int budget)
if (pfvf->flags & OTX2_FLAG_INTF_DOWN)
return workdone;
+ /* Check for adaptive interrupt coalesce */
+ if (workdone != 0 &&
+ ((pfvf->flags & OTX2_FLAG_ADPTV_INT_COAL_ENABLED) ==
+ OTX2_FLAG_ADPTV_INT_COAL_ENABLED)) {
+ /* Adjust irq coalese using net_dim */
+ otx2_adjust_adaptive_coalese(pfvf, cq_poll);
+ /* Update irq coalescing */
+ for (i = 0; i < pfvf->hw.cint_cnt; i++)
+ otx2_config_irq_coalescing(pfvf, i);
+ }
+
/* Re-enable interrupts */
otx2_write64(pfvf, NIX_LF_CINTX_ENA_W1S(cq_poll->cint_idx),
BIT_ULL(0));
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h
index f1a04cf9210c..c88e8a436029 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.h
@@ -109,6 +109,7 @@ struct otx2_cq_poll {
#define CINT_INVALID_CQ 255
u8 cint_idx;
u8 cq_ids[CQS_PER_CINT];
+ struct dim dim;
struct napi_struct napi;
};
--
2.25.1
^ permalink raw reply related
* RE: [Intel-wired-lan] [PATCH] igb: skip phy status check where unavailable
From: G, GurucharanX @ 2022-05-17 4:51 UTC (permalink / raw)
To: Kevin Mitchell
Cc: intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org,
Takuma Ueba, Jeff Kirsher, netdev@vger.kernel.org, Jakub Kicinski,
Paolo Abeni, David S. Miller
In-Reply-To: <20220429235554.13290-1-kevmitch@arista.com>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Kevin Mitchell
> Sent: Saturday, April 30, 2022 5:26 AM
> Cc: kevmitch@arista.com; intel-wired-lan@lists.osuosl.org; linux-
> kernel@vger.kernel.org; Takuma Ueba <t.ueba11@gmail.com>; Jeff Kirsher
> <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David S. Miller
> <davem@davemloft.net>
> Subject: [Intel-wired-lan] [PATCH] igb: skip phy status check where
> unavailable
>
> igb_read_phy_reg() will silently return, leaving phy_data untouched, if
> hw->ops.read_reg isn't set. Depending on the uninitialized value of
> phy_data, this led to the phy status check either succeeding immediately or
> looping continuously for 2 seconds before emitting a noisy err-level timeout.
> This message went out to the console even though there was no actual
> problem.
>
> Instead, first check if there is read_reg function pointer. If not, proceed
> without trying to check the phy status register.
>
> Fixes: b72f3f72005d ("igb: When GbE link up, wait for Remote receiver status
> condition")
> Signed-off-by: Kevin Mitchell <kevmitch@arista.com>
> ---
> drivers/net/ethernet/intel/igb/igb_main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
^ permalink raw reply
* Re: [PATCH bpf-next v4 5/7] selftests/bpf: verify token of struct mptcp_sock
From: Geliang Tang @ 2022-05-17 5:19 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Mat Martineau, Geliang Tang, Networking, bpf, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, MPTCP Upstream, Matthieu Baerts
In-Reply-To: <20220517013217.oxbpwjilwr4fzvuv@kafai-mbp.dhcp.thefacebook.com>
Martin KaFai Lau <kafai@fb.com> 于2022年5月17日周二 09:32写道:
>
> On Fri, May 13, 2022 at 03:48:25PM -0700, Mat Martineau wrote:
> [ ... ]
>
> > +/*
> > + * Parse the token from the output of 'ip mptcp monitor':
> > + *
> > + * [ CREATED] token=3ca933d3 remid=0 locid=0 saddr4=127.0.0.1 ...
> > + * [ CREATED] token=2ab57040 remid=0 locid=0 saddr4=127.0.0.1 ...
> How stable is the string format ?
>
> If it happens to have some unrelated mptcp connection going on, will the test
> break ?
Hi Martin,
Yes, it will break in that case. How can I fix this? Should I run the
test in a special net namespace?
'ip mptcp monitor' can easily run in a special net namespace:
ip -net ns1 mptcp monitor
But I don't know how to run start_server() and connect_to_fd() in a
special namespace. Could you please give me some suggestions about
this?
Thanks,
-Geliang
>
> > + */
> > +static __u32 get_msk_token(void)
> > +{
> > + char *prefix = "[ CREATED] token=";
> > + char buf[BUFSIZ] = {};
> > + __u32 token = 0;
> > + ssize_t len;
> > + int fd;
> > +
> > + sync();
> > +
> > + fd = open(monitor_log_path, O_RDONLY);
> > + if (!ASSERT_GE(fd, 0, "Failed to open monitor_log_path"))
> > + return token;
> > +
> > + len = read(fd, buf, sizeof(buf));
> > + if (!ASSERT_GT(len, 0, "Failed to read monitor_log_path"))
> > + goto err;
> > +
> > + if (strncmp(buf, prefix, strlen(prefix))) {
> > + log_err("Invalid prefix %s", buf);
> > + goto err;
> > + }
> > +
> > + token = strtol(buf + strlen(prefix), NULL, 16);
> > +
> > +err:
> > + close(fd);
> > + return token;
> > +}
> > +
>
^ permalink raw reply
* Re: [PATCH bpf-next v4 1/7] bpf: add bpf_skc_to_mptcp_sock_proto
From: Geliang Tang @ 2022-05-17 5:26 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Geliang Tang, Mat Martineau, Networking, bpf, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, MPTCP Upstream,
Nicolas Rybowski, Matthieu Baerts
In-Reply-To: <20220517010730.mmv6u2h25xyz4uwl@kafai-mbp.dhcp.thefacebook.com>
Martin KaFai Lau <kafai@fb.com> 于2022年5月17日周二 09:07写道:
>
> On Fri, May 13, 2022 at 03:48:21PM -0700, Mat Martineau wrote:
> [ ... ]
>
> > diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> > index 8b1afd6f5cc4..2ba09de955c7 100644
> > --- a/include/net/mptcp.h
> > +++ b/include/net/mptcp.h
> > @@ -284,4 +284,10 @@ static inline int mptcpv6_init(void) { return 0; }
> > static inline void mptcpv6_handle_mapped(struct sock *sk, bool mapped) { }
> > #endif
> >
> > +#if defined(CONFIG_MPTCP) && defined(CONFIG_BPF_SYSCALL)
> > +struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk);
> Can this be inline ?
This function can't be inline since it uses struct mptcp_subflow_context.
mptcp_subflow_context is defined in net/mptcp/protocol.h, and we don't
want to export it to user space in net/mptcp/protocol.h.
>
> > +#else
> > +static inline struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk) { return NULL; }
> > +#endif
> > +
> > #endif /* __NET_MPTCP_H */
>
> [ ... ]
>
> > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> > new file mode 100644
> > index 000000000000..535602ba2582
> > --- /dev/null
> > +++ b/net/mptcp/bpf.c
> > @@ -0,0 +1,22 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Multipath TCP
> > + *
> > + * Copyright (c) 2020, Tessares SA.
> > + * Copyright (c) 2022, SUSE.
> > + *
> > + * Author: Nicolas Rybowski <nicolas.rybowski@tessares.net>
> > + */
> > +
> > +#define pr_fmt(fmt) "MPTCP: " fmt
> > +
> > +#include <linux/bpf.h>
> > +#include "protocol.h"
> > +
> > +struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk)
> > +{
> > + if (sk && sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk))
> > + return mptcp_sk(mptcp_subflow_ctx(sk)->conn);
> > +
> > + return NULL;
> > +}
> > +EXPORT_SYMBOL(bpf_mptcp_sock_from_subflow);
> Is EXPORT_SYMBOL needed ?
Will drop in v5.
Thanks,
-Geliang
>
^ permalink raw reply
* Re: [PATCH bpf-next v4 3/7] selftests/bpf: add MPTCP test base
From: Geliang Tang @ 2022-05-17 5:28 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Mat Martineau, Networking, bpf, Nicolas Rybowski,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
MPTCP Upstream, Matthieu Baerts, Geliang Tang
In-Reply-To: <20220517011827.6pk2ao23tb4xjuap@kafai-mbp.dhcp.thefacebook.com>
Martin KaFai Lau <kafai@fb.com> 于2022年5月17日周二 09:18写道:
>
> On Fri, May 13, 2022 at 03:48:23PM -0700, Mat Martineau wrote:
> [ ... ]
>
> > @@ -265,7 +282,7 @@ int connect_to_fd_opts(int server_fd, const struct network_helper_opts *opts)
> > }
> >
> > addr_in = (struct sockaddr_in *)&addr;
> > - fd = socket(addr_in->sin_family, type, 0);
> > + fd = socket(addr_in->sin_family, type, opts->protocol);
> ops->protocol is the same as the server_fd's protocol ?
>
> Can that be learned from getsockopt(server_fd, SOL_SOCKET, SO_PROTOCOL, ....) ?
> Then the ops->protocol additions and related changes are not needed.
Yes, I will update this in v5.
>
> connect_to_fd_opts() has already obtained the SO_TYPE in similar way.
>
> > if (fd < 0) {
> > log_err("Failed to create client socket");
> > return -1;
> > @@ -298,6 +315,16 @@ int connect_to_fd(int server_fd, int timeout_ms)
> > return connect_to_fd_opts(server_fd, &opts);
> > }
> >
> > +int connect_to_mptcp_fd(int server_fd, int timeout_ms)
> > +{
> > + struct network_helper_opts opts = {
> > + .timeout_ms = timeout_ms,
> > + .protocol = IPPROTO_MPTCP,
> > + };
> > +
> > + return connect_to_fd_opts(server_fd, &opts);
> > +}
> > +
> > int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms)
> > {
> > struct sockaddr_storage addr;
> > diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h
> > index a4b3b2f9877b..e0feb115b2ae 100644
> > --- a/tools/testing/selftests/bpf/network_helpers.h
> > +++ b/tools/testing/selftests/bpf/network_helpers.h
> > @@ -21,6 +21,7 @@ struct network_helper_opts {
> > const char *cc;
> > int timeout_ms;
> > bool must_fail;
> > + int protocol;
> > };
> >
> > /* ipv4 test vector */
> > @@ -42,11 +43,14 @@ extern struct ipv6_packet pkt_v6;
> > int settimeo(int fd, int timeout_ms);
> > int start_server(int family, int type, const char *addr, __u16 port,
> > int timeout_ms);
> > +int start_mptcp_server(int family, const char *addr, __u16 port,
> > + int timeout_ms);
> > int *start_reuseport_server(int family, int type, const char *addr_str,
> > __u16 port, int timeout_ms,
> > unsigned int nr_listens);
> > void free_fds(int *fds, unsigned int nr_close_fds);
> > int connect_to_fd(int server_fd, int timeout_ms);
> > +int connect_to_mptcp_fd(int server_fd, int timeout_ms);
> > int connect_to_fd_opts(int server_fd, const struct network_helper_opts *opts);
> > int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms);
> > int fastopen_connect(int server_fd, const char *data, unsigned int data_len,
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox