Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3] ath9k: dfs: Remove VLA usage
From: Kalle Valo @ 2018-04-25  7:26 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andreas Christoforou, Rosen Penev, Eric Dumazet, Joe Perches,
	linux-wireless, netdev, QCA ath9k Development, kernel-hardening,
	linux-kernel
In-Reply-To: <20180424235752.GA37317@beast>

Kees Cook <keescook@chromium.org> writes:

> In the quest to remove all stack VLA usage from the kernel[1], this
> redefines FFT_NUM_SAMPLES as a #define instead of const int, which still
> triggers gcc's VLA checking pass.
>
> [1] https://lkml.org/lkml/2018/3/7/621
>
> Co-developed-by: Andreas Christoforou <andreaschristofo@gmail.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> v3: replace FFT_NUM_SAMPLES as a #define (Joe)
> ---
>  drivers/net/wireless/ath/ath9k/dfs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/dfs.c
> b/drivers/net/wireless/ath/ath9k/dfs.c
> index 6fee9a464cce..e6e56a925121 100644
> --- a/drivers/net/wireless/ath/ath9k/dfs.c
> +++ b/drivers/net/wireless/ath/ath9k/dfs.c
> @@ -40,8 +40,8 @@ static const int BIN_DELTA_MIN		= 1;
>  static const int BIN_DELTA_MAX		= 10;
>  
>  /* we need at least 3 deltas / 4 samples for a reliable chirp detection */
> -#define NUM_DIFFS 3
> -static const int FFT_NUM_SAMPLES	= (NUM_DIFFS + 1);
> +#define NUM_DIFFS	3
> +#define FFT_NUM_SAMPLES	(NUM_DIFFS + 1)

I have already applied an almost identical patch:

ath9k: dfs: remove accidental use of stack VLA

https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=ath-next&id=9c27489a34548913baaaf3b2776e05d4a9389e3e

-- 
Kalle Valo

^ permalink raw reply

* Re: [RFC bpf] bpf, x64: fix JIT emission for dead code
From: Daniel Borkmann @ 2018-04-25  7:48 UTC (permalink / raw)
  To: Gianluca Borello, netdev; +Cc: ast
In-Reply-To: <20180425054216.48961-1-g.borello@gmail.com>

On 04/25/2018 07:42 AM, Gianluca Borello wrote:
> Commit 2a5418a13fcf ("bpf: improve dead code sanitizing") replaced dead
> code with a series of ja-1 instructions, for safety. That made JIT
> compilation much more complex for some BPF programs. One instance of such
> programs is, for example:
[...]
> A possible approach to mitigate this behavior consists into noticing that
> for ja-1 instructions we don't really need to rely on the estimated size
> of the previous and current instructions, we know that a -1 BPF jump
> offset can be safely translated into a 0xEB instruction with a jump offset
> of -2.
> 
> Such fix brings the BPF program in the previous example to complete again
> in ~9 passes.
> 
> Fixes: 2a5418a13fcf ("bpf: improve dead code sanitizing")
> Signed-off-by: Gianluca Borello <g.borello@gmail.com>

Thanks for reporting, Gianluca. The approach your fix takes looks good to me!

^ permalink raw reply

* Re: [PATCH] qtnfmac: fix qtnf_netdev_hard_start_xmit()'s return type
From: Sergey Matyukevich @ 2018-04-25  7:49 UTC (permalink / raw)
  To: Luc Van Oostenryck
  Cc: linux-kernel, Igor Mitsyanko, Avinash Patil, Sergey Matyukevich,
	Kalle Valo, Kees Cook, linux-wireless, netdev
In-Reply-To: <20180424131810.4963-1-luc.vanoostenryck@gmail.com>

Hi Luc and all,

> The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
> which is a typedef for an enum type, but the implementation in this
> driver returns an 'int'.
> 
> Fix this by returning 'netdev_tx_t' in this driver too.
> 
> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
> ---
>  drivers/net/wireless/quantenna/qtnfmac/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/quantenna/qtnfmac/core.c b/drivers/net/wireless/quantenna/qtnfmac/core.c
> index cf26c15a8..b3bfb4faa 100644
> --- a/drivers/net/wireless/quantenna/qtnfmac/core.c
> +++ b/drivers/net/wireless/quantenna/qtnfmac/core.c
> @@ -76,7 +76,7 @@ static int qtnf_netdev_close(struct net_device *ndev)
> 
>  /* Netdev handler for data transmission.
>   */
> -static int
> +static netdev_tx_t
>  qtnf_netdev_hard_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>  {
>         struct qtnf_vif *vif;

Previous ACK from Igor slipped through the cracks due to
outlook/exchange issues. So here is another one.

Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>

Thanks for the fix !

Regards,
Sergey

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: clear the ip_tunnel_info.
From: Daniel Borkmann @ 2018-04-25  7:54 UTC (permalink / raw)
  To: William Tu, netdev
In-Reply-To: <1524638819-31626-1-git-send-email-u9012063@gmail.com>

On 04/25/2018 08:46 AM, William Tu wrote:
> The percpu metadata_dst might carry the stale ip_tunnel_info
> and cause incorrect behavior.  When mixing tests using ipv4/ipv6
> bpf vxlan and geneve tunnel, the ipv6 tunnel info incorrectly uses
> ipv4's src ip addr as its ipv6 src address, because the previous
> tunnel info does not clean up.  The patch zeros the fields in
> ip_tunnel_info.
> 
> Signed-off-by: William Tu <u9012063@gmail.com>
> Reported-by: Yifeng Sun <pkusunyifeng@gmail.com>

Since this is a fix, I've applied this to bpf, thanks William!

^ permalink raw reply

* Re: [PATCH bpf-next 0/4] nfp: bpf: optimize negative sums
From: Daniel Borkmann @ 2018-04-25  7:58 UTC (permalink / raw)
  To: Jakub Kicinski, alexei.starovoitov; +Cc: oss-drivers, netdev
In-Reply-To: <20180425042239.27869-1-jakub.kicinski@netronome.com>

On 04/25/2018 06:22 AM, Jakub Kicinski wrote:
> Hi!
> 
> This set adds an optimization run to the NFP jit to turn ADD and SUB
> instructions with negative immediate into the opposite operation with
> a positive immediate.  NFP can fit small immediates into the instructions
> but it can't ever fit negative immediates.  Addition of small negative
> immediates is quite common in BPF programs for stack address calculations,
> therefore this optimization gives us non-negligible savings in instruction
> count (up to 4%).

Applied to bpf-next, thanks Jakub!

^ permalink raw reply

* Re: [1/2] net: wireless: zydas: Replace mdelay with msleep in zd1201_probe
From: Kalle Valo @ 2018-04-25  8:15 UTC (permalink / raw)
  To: Jia-Ju Bai
  Cc: davem, stephen, arvind.yadav.cs, johannes.berg, linux-wireless,
	netdev, linux-kernel, Jia-Ju Bai
In-Reply-To: <1523367004-31935-1-git-send-email-baijiaju1990@gmail.com>

Jia-Ju Bai <baijiaju1990@gmail.com> wrote:

> zd1201_probe() is never called in atomic context.
> 
> zd1201_probe() is only set as ".probe" in struct usb_driver.
> 
> Despite never getting called from atomic context, zd1201_probe()
> calls mdelay() to busily wait.
> This is not necessary and can be replaced with msleep() to
> avoid busy waiting.
> 
> This is found by a static analysis tool named DCNS written by myself.
> And I also manually check it.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>

I need a review from someone else before I'm willing to take this.

-- 
https://patchwork.kernel.org/patch/10333189/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* [PATCH bpf-next] bpf: Allow bpf_jit_enable = 2 with BPF_JIT_ALWAYS_ON config
From: Leo Yan @ 2018-04-25  8:18 UTC (permalink / raw)
  To: David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Kirill Tkhai, netdev, linux-kernel
  Cc: Leo Yan

After enabled BPF_JIT_ALWAYS_ON config, bpf_jit_enable always equals to
1; it is impossible to set 'bpf_jit_enable = 2' and the kernel has no
chance to call bpf_jit_dump().

This patch relaxes bpf_jit_enable range to [1..2] when kernel config
BPF_JIT_ALWAYS_ON is enabled so can invoke jit dump.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 net/core/sysctl_net_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index b1a2c5e..6a39b22 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -371,7 +371,7 @@ static int proc_dointvec_minmax_bpf_enable(struct ctl_table *table, int write,
 		.proc_handler	= proc_dointvec_minmax_bpf_enable,
 # ifdef CONFIG_BPF_JIT_ALWAYS_ON
 		.extra1		= &one,
-		.extra2		= &one,
+		.extra2		= &two,
 # else
 		.extra1		= &zero,
 		.extra2		= &two,
-- 
1.9.1

^ permalink raw reply related

* Re: brcmsmac: phy_lcn: remove duplicate code
From: Kalle Valo @ 2018-04-25  8:24 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, linux-wireless, brcm80211-dev-list.pdl,
	brcm80211-dev-list, netdev, linux-kernel, Gustavo A. R. Silva
In-Reply-To: <20180405000944.GA22743@embeddedor.com>

"Gustavo A. R. Silva" <gustavo@embeddedor.com> wrote:

> Remove and refactor some code in order to avoid having identical code
> for different branches.
> 
> Notice that this piece of code hasn't been modified since 2011.
> 
> Addresses-Coverity-ID: 1226756 ("Identical code for different branches")
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>

Patch applied to wireless-drivers-next.git, thanks.

863683cfbbfc brcmsmac: phy_lcn: remove duplicate code

-- 
https://patchwork.kernel.org/patch/10323665/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: qtnfmac: pearl: pcie: fix memory leak in qtnf_fw_work_handler
From: Kalle Valo @ 2018-04-25  8:26 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: Igor Mitsyanko, Avinash Patil, Sergey Matyukevich, linux-wireless,
	netdev, linux-kernel, Gustavo A. R. Silva
In-Reply-To: <20180405154949.GA32223@embeddedor.com>

"Gustavo A. R. Silva" <gustavo@embeddedor.com> wrote:

> In case memory resources for fw were succesfully allocated, release
> them before jumping to fw_load_fail.
> 
> Addresses-Coverity-ID: 1466092 ("Resource leak")
> Fixes: c3b2f7ca4186 ("qtnfmac: implement asynchronous firmware loading")
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
> Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>

Patch applied to wireless-drivers-next.git, thanks.

376377004464 qtnfmac: pearl: pcie: fix memory leak in qtnf_fw_work_handler

-- 
https://patchwork.kernel.org/patch/10324855/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH] net: net_cls: remove a NULL check for css_cls_state
From: Li RongQing @ 2018-04-25  8:35 UTC (permalink / raw)
  To: David Miller; +Cc: lirongqing, netdev
In-Reply-To: <20180420.103725.37816458687606953.davem@davemloft.net>

On 4/20/18, David Miller <davem@davemloft.net> wrote:
> From: Li RongQing <lirongqing@baidu.com>
> Date: Thu, 19 Apr 2018 12:59:21 +0800
>
>> The input of css_cls_state() is impossible to NULL except
>> cgrp_css_online, so simplify it
>>
>> Signed-off-by: Li RongQing <lirongqing@baidu.com>
>
> I don't view this as an improvement.  Just let the helper always check
> NULL and that way there are less situations to audit.
>
css_cls_state maybe return NULL, but nearly no places check the return
value with NULL, this seems unreadable.

net/core/netclassid_cgroup.c:27:        return
css_cls_state(task_css_check(p, net_cls_cgrp_id,
net/core/netclassid_cgroup.c:46:        struct cgroup_cls_state *cs =
css_cls_state(css);
net/core/netclassid_cgroup.c:47:        struct cgroup_cls_state
*parent = css_cls_state(css->parent);
net/core/netclassid_cgroup.c:57:        kfree(css_cls_state(css));
net/core/netclassid_cgroup.c:82:                           (void
*)(unsigned long)css_cls_state(css)->classid);
net/core/netclassid_cgroup.c:89:        return css_cls_state(css)->classid;
net/core/netclassid_cgroup.c:95:        struct cgroup_cls_state *cs =
css_cls_state(css);

> And it's not like this is a critical fast path either.
>

I see css_cls_state will be called  when send packet if
CONFIG_NET_CLS_ACT and CONFIG_NET_EGRESS enabled, the calling stack is
like below:

css_cls_state
  task_cls_state
    task_get_classid
       cls_cgroup_classify
          tcf_classify
            sch_handle_egress
               __dev_queue_xmit
                        CONFIG_NET_CLS_ACT
                         CONFIG_NET_EGRESS

-RongQing






> I'm not applying this, sorry.
>

^ permalink raw reply

* Re: [PATCH net-next 3/4] nfp: flower: support offloading multiple rules with same cookie
From: John Hurley @ 2018-04-25  8:51 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jakub Kicinski, David Miller, Linux Netdev List, oss-drivers,
	ASAP_Direct_Dev
In-Reply-To: <CAJ3xEMg=E+mq=QGaAtxWBY1NvWG3vcwhNFJ=p7xPLBy7X9nE0Q@mail.gmail.com>

On Wed, Apr 25, 2018 at 7:31 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Wed, Apr 25, 2018 at 7:17 AM, Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
>> From: John Hurley <john.hurley@netronome.com>
>>
>> When multiple netdevs are attached to a tc offload block and register for
>> callbacks, a rule added to the block will be propogated to all netdevs.
>> Previously these were detected as duplicates (based on cookie) and
>> rejected. Modify the rule nfp lookup function to optionally include an
>> ingress netdev and a host context along with the cookie value when
>> searching for a rule. When a new rule is passed to the driver, the netdev
>> the rule is to be attached to is considered when searching for dublicates.
>
> so if the same rule (cookie) is provided to the driver through multiple ingress
> devices you will not reject it -- what is the use case for that, is it
> block sharing?

Hi Or,
Yes, block sharing is the current use-case.
Simple example for clarity....
Here we want to offload the filter to both ingress devs nfp_0 and nfp_1:

tc qdisc add dev nfp_p0 ingress_block 22 ingress
tc qdisc add dev nfp_p1 ingress_block 22 ingress
tc filter add block 22 protocol ip parent ffff: flower skip_sw
ip_proto tcp action drop

^ permalink raw reply

* Re: [PATCH net-next 3/4] nfp: flower: support offloading multiple rules with same cookie
From: Or Gerlitz @ 2018-04-25  8:56 UTC (permalink / raw)
  To: John Hurley
  Cc: Jakub Kicinski, David Miller, Linux Netdev List, oss-drivers,
	ASAP_Direct_Dev
In-Reply-To: <CAK+XE=mu8qEPt-J9_6KmpJeyrSCKcytRT20ZGh+cHFY=j1vudg@mail.gmail.com>

On Wed, Apr 25, 2018 at 11:51 AM, John Hurley <john.hurley@netronome.com> wrote:
> On Wed, Apr 25, 2018 at 7:31 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Wed, Apr 25, 2018 at 7:17 AM, Jakub Kicinski
>> <jakub.kicinski@netronome.com> wrote:
>>> From: John Hurley <john.hurley@netronome.com>
>>>
>>> When multiple netdevs are attached to a tc offload block and register for
>>> callbacks, a rule added to the block will be propogated to all netdevs.
>>> Previously these were detected as duplicates (based on cookie) and
>>> rejected. Modify the rule nfp lookup function to optionally include an
>>> ingress netdev and a host context along with the cookie value when
>>> searching for a rule. When a new rule is passed to the driver, the netdev
>>> the rule is to be attached to is considered when searching for dublicates.
>>
>> so if the same rule (cookie) is provided to the driver through multiple ingress
>> devices you will not reject it -- what is the use case for that, is it
>> block sharing?
>
> Hi Or,
> Yes, block sharing is the current use-case.
> Simple example for clarity....
> Here we want to offload the filter to both ingress devs nfp_0 and nfp_1:
>
> tc qdisc add dev nfp_p0 ingress_block 22 ingress
> tc qdisc add dev nfp_p1 ingress_block 22 ingress
> tc filter add block 22 protocol ip parent ffff: flower skip_sw
> ip_proto tcp action drop

cool!

Just out of curiosity, do you actually share this HW rule or you duplicate it?

^ permalink raw reply

* Re: [PATCH bpf-next v6 02/10] bpf: add bpf_get_stack helper
From: Daniel Borkmann @ 2018-04-25  9:00 UTC (permalink / raw)
  To: Yonghong Song, ast, netdev, ecree; +Cc: kernel-team
In-Reply-To: <20180423212752.986580-3-yhs@fb.com>

On 04/23/2018 11:27 PM, Yonghong Song wrote:
> Currently, stackmap and bpf_get_stackid helper are provided
> for bpf program to get the stack trace. This approach has
> a limitation though. If two stack traces have the same hash,
> only one will get stored in the stackmap table,
> so some stack traces are missing from user perspective.
> 
> This patch implements a new helper, bpf_get_stack, will
> send stack traces directly to bpf program. The bpf program
> is able to see all stack traces, and then can do in-kernel
> processing or send stack traces to user space through
> shared map or bpf_perf_event_output.
> 
> Acked-by: Alexei Starovoitov <ast@fb.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>
[...]
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index d315b39..bf22eca 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -31,6 +31,7 @@
>  #include <linux/rbtree_latch.h>
>  #include <linux/kallsyms.h>
>  #include <linux/rcupdate.h>
> +#include <linux/perf_event.h>
>  
>  #include <asm/unaligned.h>
>  
> @@ -1709,6 +1710,10 @@ static void bpf_prog_free_deferred(struct work_struct *work)
>  	aux = container_of(work, struct bpf_prog_aux, work);
>  	if (bpf_prog_is_dev_bound(aux))
>  		bpf_prog_offload_destroy(aux->prog);
> +#ifdef CONFIG_PERF_EVENTS
> +	if (aux->prog->need_callchain_buf)
> +		put_callchain_buffers();
> +#endif
>  	for (i = 0; i < aux->func_cnt; i++)
>  		bpf_jit_free(aux->func[i]);
>  	if (aux->func_cnt) {
[...]
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index fe23dc5a..1ee71f6 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1360,6 +1360,16 @@ static int bpf_prog_load(union bpf_attr *attr)
>  	if (err)
>  		goto free_used_maps;
>  
> +	if (prog->need_callchain_buf) {
> +#ifdef CONFIG_PERF_EVENTS
> +		err = get_callchain_buffers(sysctl_perf_event_max_stack);
> +#else
> +		err = -ENOTSUPP;
> +#endif
> +		if (err)
> +			goto free_used_maps;
> +	}
> +
>  	err = bpf_prog_new_fd(prog);
>  	if (err < 0) {
>  		/* failed to allocate fd.
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 5dd1dcb..aba9425 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2460,6 +2460,9 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
>  	if (err)
>  		return err;
>  
> +	if (func_id == BPF_FUNC_get_stack)
> +		env->prog->need_callchain_buf = true;
> +
>  	if (changes_data)
>  		clear_all_pkt_pointers(env);
>  	return 0;

The above three hunks will cause a use-after-free on the perf callchain buffers.

In check_helper_call() you mark the prog with need_callchain_buf, where the
program hasn't fully completed verification phase yet, meaning some buggy prog
will still bail out.

However, you do the get_callchain_buffers() at a much later phase, so when you
bail out with error from bpf_check(), you take the free_used_maps error path
which calls bpf_prog_free().

The latter calls into bpf_prog_free_deferred() where you do the put_callchain_buffers()
since the need_callchain_buf is marked, but without prior get_callchain_buffers().

^ permalink raw reply

* Re: [PATCH net-next 3/4] nfp: flower: support offloading multiple rules with same cookie
From: John Hurley @ 2018-04-25  9:02 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jakub Kicinski, David Miller, Linux Netdev List, oss-drivers,
	ASAP_Direct_Dev
In-Reply-To: <CAJ3xEMhv5e9gOx-qELt=YMcpe9Svi=6uKYXs4c9PXDXOR35kow@mail.gmail.com>

On Wed, Apr 25, 2018 at 9:56 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Wed, Apr 25, 2018 at 11:51 AM, John Hurley <john.hurley@netronome.com> wrote:
>> On Wed, Apr 25, 2018 at 7:31 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Wed, Apr 25, 2018 at 7:17 AM, Jakub Kicinski
>>> <jakub.kicinski@netronome.com> wrote:
>>>> From: John Hurley <john.hurley@netronome.com>
>>>>
>>>> When multiple netdevs are attached to a tc offload block and register for
>>>> callbacks, a rule added to the block will be propogated to all netdevs.
>>>> Previously these were detected as duplicates (based on cookie) and
>>>> rejected. Modify the rule nfp lookup function to optionally include an
>>>> ingress netdev and a host context along with the cookie value when
>>>> searching for a rule. When a new rule is passed to the driver, the netdev
>>>> the rule is to be attached to is considered when searching for dublicates.
>>>
>>> so if the same rule (cookie) is provided to the driver through multiple ingress
>>> devices you will not reject it -- what is the use case for that, is it
>>> block sharing?
>>
>> Hi Or,
>> Yes, block sharing is the current use-case.
>> Simple example for clarity....
>> Here we want to offload the filter to both ingress devs nfp_0 and nfp_1:
>>
>> tc qdisc add dev nfp_p0 ingress_block 22 ingress
>> tc qdisc add dev nfp_p1 ingress_block 22 ingress
>> tc filter add block 22 protocol ip parent ffff: flower skip_sw
>> ip_proto tcp action drop
>
> cool!
>
> Just out of curiosity, do you actually share this HW rule or you duplicate it?

It's duplicated.
At HW level the ingress port is part of the match so technically it's
a different rule.

^ permalink raw reply

* Re: [PATCH bpf-next 13/15] xsk: support for Tx
From: Magnus Karlsson @ 2018-04-25  9:11 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Björn Töpel, Karlsson, Magnus, Alexander Duyck,
	Alexander Duyck, John Fastabend, Alexei Starovoitov,
	Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	Network Development, michael.lundkvist, Brandeburg, Jesse,
	Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <CAF=yD-JxQsJuJMh4=3An=oE0+R6FJ7f7CnUmQP41EOjEMc7VmQ@mail.gmail.com>

On Tue, Apr 24, 2018 at 6:57 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Mon, Apr 23, 2018 at 9:56 AM, Björn Töpel <bjorn.topel@gmail.com> wrote:
>> From: Magnus Karlsson <magnus.karlsson@intel.com>
>>
>> Here, Tx support is added. The user fills the Tx queue with frames to
>> be sent by the kernel, and let's the kernel know using the sendmsg
>> syscall.
>>
>> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
>
>> +static int xsk_xmit_skb(struct sk_buff *skb)
>
> This is basically packet_direct_xmit. Might be better to just move that
> to net/core/dev.c and use in both AF_PACKET and AF_XDP.

It is packet_direct_xmit with some code removed that is not used :-),
so your suggestion makes a lot of sense. Will implement in this patch
set.

> Also, (eventually) AF_XDP may also want to support the regular path
> through dev_queue_xmit to go through traffic shaping.

Agreed. Will put this on the todo list for a later patch.

>> +{
>> +       struct net_device *dev = skb->dev;
>> +       struct sk_buff *orig_skb = skb;
>> +       struct netdev_queue *txq;
>> +       int ret = NETDEV_TX_BUSY;
>> +       bool again = false;
>> +
>> +       if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev)))
>> +               goto drop;
>> +
>> +       skb = validate_xmit_skb_list(skb, dev, &again);
>> +       if (skb != orig_skb)
>> +               return NET_XMIT_DROP;
>
> Need to free generated segment list on error, see packet_direct_xmit.

I do not use segments in the TX code for reasons of simplicity and the
free is in the calling function. But as I will create a common
packet_direct_xmit according to your suggestion, it will have a
kfree_skb_list() there as in af_packet.c.

>> +
>> +       txq = skb_get_tx_queue(dev, skb);
>> +
>> +       local_bh_disable();
>> +
>> +       HARD_TX_LOCK(dev, txq, smp_processor_id());
>> +       if (!netif_xmit_frozen_or_drv_stopped(txq))
>> +               ret = netdev_start_xmit(skb, dev, txq, false);
>> +       HARD_TX_UNLOCK(dev, txq);
>> +
>> +       local_bh_enable();
>> +
>> +       if (!dev_xmit_complete(ret))
>> +               goto out_err;
>> +
>> +       return ret;
>> +drop:
>> +       atomic_long_inc(&dev->tx_dropped);
>> +out_err:
>> +       return NET_XMIT_DROP;
>> +}
>
>> +static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
>> +                           size_t total_len)
>> +{
>> +       bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
>> +       u32 max_batch = TX_BATCH_SIZE;
>> +       struct xdp_sock *xs = xdp_sk(sk);
>> +       bool sent_frame = false;
>> +       struct xdp_desc desc;
>> +       struct sk_buff *skb;
>> +       int err = 0;
>> +
>> +       if (unlikely(!xs->tx))
>> +               return -ENOBUFS;
>> +       if (need_wait)
>> +               return -EOPNOTSUPP;
>> +
>> +       mutex_lock(&xs->mutex);
>> +
>> +       while (xskq_peek_desc(xs->tx, &desc)) {
>
> It is possible to pass a chain of skbs to validate_xmit_skb_list and
> eventually pass this chain to xsk_xmit_skb, amortizing the cost of
> taking the txq lock. Fine to ignore for this patch set.

Good suggestion. Will put it down on the todo list for a later patch set.

>> +               char *buffer;
>> +               u32 id, len;
>> +
>> +               if (max_batch-- == 0) {
>> +                       err = -EAGAIN;
>> +                       goto out;
>> +               }
>> +
>> +               if (xskq_reserve_id(xs->umem->cq)) {
>> +                       err = -EAGAIN;
>> +                       goto out;
>> +               }
>> +
>> +               len = desc.len;
>> +               if (unlikely(len > xs->dev->mtu)) {
>> +                       err = -EMSGSIZE;
>> +                       goto out;
>> +               }
>> +
>> +               skb = sock_alloc_send_skb(sk, len, !need_wait, &err);
>> +               if (unlikely(!skb)) {
>> +                       err = -EAGAIN;
>> +                       goto out;
>> +               }
>> +
>> +               skb_put(skb, len);
>> +               id = desc.idx;
>> +               buffer = xdp_umem_get_data(xs->umem, id) + desc.offset;
>> +               err = skb_store_bits(skb, 0, buffer, len);
>> +               if (unlikely(err))
>> +                       goto out_store;
>
> As xsk_destruct_skb delays notification until consume_skb is called, this
> copy can be avoided by linking the xdp buffer into the skb frags array,
> analogous to tpacket_snd.
>
> You probably don't care much about the copy slow path, and this can be
> implemented later, so also no need to do in this patchset.

Agreed. I will also put this in the todo list for a later patch set.

> static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
> +                                             struct xdp_desc *desc)
> +{
> +       struct xdp_rxtx_ring *ring;
> +
> +       if (q->cons_tail == q->cons_head) {
> +               WRITE_ONCE(q->ring->consumer, q->cons_tail);
> +               q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);
> +
> +               /* Order consumer and data */
> +               smp_rmb();
> +
> +               return xskq_validate_desc(q, desc);
> +       }
> +
> +       ring = (struct xdp_rxtx_ring *)q->ring;
> +       *desc = ring->desc[q->cons_tail & q->ring_mask];
> +       return desc;
>
> This only validates descriptors if taking the branch.

Yes, that is because we only want to validate the descriptors once
even if we call this function multiple times for the same entry.

Thanks. Highly appreciated comments Will.

/Magnus

^ permalink raw reply

* Re: [PATCH v3 ipsec-next] xfrm: remove VLA usage in __xfrm6_sort()
From: Stefano Brivio @ 2018-04-25  9:11 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andreas Christoforou, kernel-hardening, Steffen Klassert,
	Herbert Xu, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	netdev, linux-kernel
In-Reply-To: <20180424234651.GA30225@beast>

Hi Kees,

On Tue, 24 Apr 2018 16:46:51 -0700
Kees Cook <keescook@chromium.org> wrote:

> In the quest to remove all stack VLA usage removed from the kernel[1],
> just use XFRM_MAX_DEPTH as already done for the "class" array. In one
> case, it'll do this loop up to 5, the other caller up to 6.
> 
> [1] https://lkml.org/lkml/2018/3/7/621
> 
> Co-developed-by: Andreas Christoforou <andreaschristofo@gmail.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> v3:
> - adjust Subject and commit log (Steffen)
> - use "= { }" instead of memset() (Stefano)
> - reorder variables (Stefano)
> v2:
> - use XFRM_MAX_DEPTH for "count" array (Steffen and Mathias).
> ---
>  net/ipv6/xfrm6_state.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
> index 16f434791763..eeb44b64ae7f 100644
> --- a/net/ipv6/xfrm6_state.c
> +++ b/net/ipv6/xfrm6_state.c
> @@ -60,9 +60,9 @@ xfrm6_init_temprop(struct xfrm_state *x, const struct xfrm_tmpl *tmpl,
>  static int
>  __xfrm6_sort(void **dst, void **src, int n, int (*cmp)(void *p), int maxclass)
>  {
> -	int i;
> +	int count[XFRM_MAX_DEPTH] = { };
>  	int class[XFRM_MAX_DEPTH];
> -	int count[maxclass];
> +	int i;
>  
>  	memset(count, 0, sizeof(count));

I guess you forgot to remove the memset() here. Just to be clear, I
think this is how it should look like:

--- a/net/ipv6/xfrm6_state.c
+++ b/net/ipv6/xfrm6_state.c
@@ -60,11 +60,9 @@ xfrm6_init_temprop(struct xfrm_state *x, const struct xfrm_tmpl *tmpl,
 static int
 __xfrm6_sort(void **dst, void **src, int n, int (*cmp)(void *p), int maxclass)
 {
-       int i;
+       int count[XFRM_MAX_DEPTH] = { };
        int class[XFRM_MAX_DEPTH];
-       int count[maxclass];
-
-       memset(count, 0, sizeof(count));
+       int i;
 
        for (i = 0; i < n; i++) {
                int c;

-- 
Stefano

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: Allow bpf_jit_enable = 2 with BPF_JIT_ALWAYS_ON config
From: Daniel Borkmann @ 2018-04-25  9:12 UTC (permalink / raw)
  To: Leo Yan, David S. Miller, Alexei Starovoitov, Kirill Tkhai,
	netdev, linux-kernel
In-Reply-To: <1524644322-9263-1-git-send-email-leo.yan@linaro.org>

On 04/25/2018 10:18 AM, Leo Yan wrote:
> After enabled BPF_JIT_ALWAYS_ON config, bpf_jit_enable always equals to
> 1; it is impossible to set 'bpf_jit_enable = 2' and the kernel has no
> chance to call bpf_jit_dump().
> 
> This patch relaxes bpf_jit_enable range to [1..2] when kernel config
> BPF_JIT_ALWAYS_ON is enabled so can invoke jit dump.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>

Is there a specific reason why you need this here instead of retrieving the
dump from the newer interface available from bpftool (tools/bpf/bpftool/)?
The bpf_jit_enable = 2 is not recommended these days since it dumps into the
kernel log which is often readable from unpriv as well. bpftool makes use
of the BPF_OBJ_GET_INFO_BY_FD interface via bpf syscall to get the JIT dump
instead when bpf_jit_enable is set.

> ---
>  net/core/sysctl_net_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> index b1a2c5e..6a39b22 100644
> --- a/net/core/sysctl_net_core.c
> +++ b/net/core/sysctl_net_core.c
> @@ -371,7 +371,7 @@ static int proc_dointvec_minmax_bpf_enable(struct ctl_table *table, int write,
>  		.proc_handler	= proc_dointvec_minmax_bpf_enable,
>  # ifdef CONFIG_BPF_JIT_ALWAYS_ON
>  		.extra1		= &one,
> -		.extra2		= &one,
> +		.extra2		= &two,
>  # else
>  		.extra1		= &zero,
>  		.extra2		= &two,
> 

^ permalink raw reply

* Re: [PATCH net-next 3/4] nfp: flower: support offloading multiple rules with same cookie
From: Or Gerlitz @ 2018-04-25  9:13 UTC (permalink / raw)
  To: John Hurley
  Cc: Jakub Kicinski, David Miller, Linux Netdev List, oss-drivers,
	ASAP_Direct_Dev
In-Reply-To: <CAK+XE==18NFnRYDZKhREmbVawbeUHj1W=HeymLzBBHwnXSfOKg@mail.gmail.com>

On Wed, Apr 25, 2018 at 12:02 PM, John Hurley <john.hurley@netronome.com> wrote:
> On Wed, Apr 25, 2018 at 9:56 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Wed, Apr 25, 2018 at 11:51 AM, John Hurley <john.hurley@netronome.com> wrote:
>>> On Wed, Apr 25, 2018 at 7:31 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>> On Wed, Apr 25, 2018 at 7:17 AM, Jakub Kicinski
>>>> <jakub.kicinski@netronome.com> wrote:
>>>>> From: John Hurley <john.hurley@netronome.com>
>>>>>
>>>>> When multiple netdevs are attached to a tc offload block and register for
>>>>> callbacks, a rule added to the block will be propogated to all netdevs.
>>>>> Previously these were detected as duplicates (based on cookie) and
>>>>> rejected. Modify the rule nfp lookup function to optionally include an
>>>>> ingress netdev and a host context along with the cookie value when
>>>>> searching for a rule. When a new rule is passed to the driver, the netdev
>>>>> the rule is to be attached to is considered when searching for dublicates.
>>>>
>>>> so if the same rule (cookie) is provided to the driver through multiple ingress
>>>> devices you will not reject it -- what is the use case for that, is it
>>>> block sharing?
>>>
>>> Hi Or,
>>> Yes, block sharing is the current use-case.
>>> Simple example for clarity....
>>> Here we want to offload the filter to both ingress devs nfp_0 and nfp_1:
>>>
>>> tc qdisc add dev nfp_p0 ingress_block 22 ingress
>>> tc qdisc add dev nfp_p1 ingress_block 22 ingress
>>> tc filter add block 22 protocol ip parent ffff: flower skip_sw
>>> ip_proto tcp action drop
>>
>> cool!
>>
>> Just out of curiosity, do you actually share this HW rule or you duplicate it?
>
> It's duplicated. At HW level the ingress port is part of the match so technically it's
> a different rule.

I see, we have also a match on the ingress port as part of the HW API, which
means we will have to apply a similar practice if we want to support
block sharing quickly.

Just to make sure, under tc block sharing the tc stack calls for hw
offloading of the
same rule (same cookie) multiple times, each with different ingress
device, right?


Or.

^ permalink raw reply

* Re: [PATCH net-next 4/4] nfp: flower: ignore duplicate cb requests for same rule
From: Or Gerlitz @ 2018-04-25  9:17 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David Miller, Linux Netdev List, oss-drivers, John Hurley
In-Reply-To: <20180425041704.26882-5-jakub.kicinski@netronome.com>

On Wed, Apr 25, 2018 at 7:17 AM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> From: John Hurley <john.hurley@netronome.com>
>
> If a flower rule has a repr both as ingress and egress port then 2
> callbacks may be generated for the same rule request.
>
> Add an indicator to each flow as to whether or not it was added from an
> ingress registered cb. If so then ignore add/del/stat requests to it from
> an egress cb.

So on add() you ignore (return success) - I wasn't sure from the patch
what do you do for stat()/del() -- success? why not err? as you know I am
working on the same patch for mlx5, lets align here please.

Or.

^ permalink raw reply

* Re: [PATCH 2/8] dmaengine: shdmac: Change platform check to CONFIG_ARCH_RENESAS
From: Vinod Koul @ 2018-04-25  9:18 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: alsa-devel, Kuninori Morimoto, Catalin Marinas, Will Deacon,
	Liam Girdwood, Laurent Pinchart, devel, Mauro Carvalho Chehab,
	Magnus Damm, Russell King, linux-media, Arnd Bergmann, Mark Brown,
	Dan Williams, Jaroslav Kysela, linux-arm-kernel, Sergei Shtylyov,
	Greg Kroah-Hartman, Takashi Iwai, linux-kernel, linux-renesas-soc,
	Simon Horman, netdev, dmaengine
In-Reply-To: <1524230914-10175-3-git-send-email-geert+renesas@glider.be>

On Fri, Apr 20, 2018 at 03:28:28PM +0200, Geert Uytterhoeven wrote:
> Since commit 9b5ba0df4ea4f940 ("ARM: shmobile: Introduce ARCH_RENESAS")
> is CONFIG_ARCH_RENESAS a more appropriate platform check than the legacy
> CONFIG_ARCH_SHMOBILE, hence use the former.
> 
> Renesas SuperH SH-Mobile SoCs are still covered by the CONFIG_CPU_SH4
> check, just like before support for Renesas ARM SoCs was added.
> 
> Instead of blindly changing all the #ifdefs, switch the main code block
> in sh_dmae_probe() to IS_ENABLED(), as this allows to remove all the
> remaining #ifdefs.
> 
> This will allow to drop ARCH_SHMOBILE on ARM in the near future.

Applied, thanks

-- 
~Vinod

^ permalink raw reply

* Re: [net-next v3] ipv6: sr: Compute flowlabel for outer IPv6 header of seg6 encap mode
From: David Lebrun @ 2018-04-25  9:23 UTC (permalink / raw)
  To: Ahmed Abdelsalam, davem, kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <1524594196-12383-1-git-send-email-amsalam20@gmail.com>

On 04/24/2018 07:23 PM, Ahmed Abdelsalam wrote:
> 
> Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>

Acked-by: David Lebrun <dlebrun@google.com>

^ permalink raw reply

* Re: [net 1/6] ixgbevf: ensure xdp_ring resources are free'd on error exit
From: Sergei Shtylyov @ 2018-04-25  9:25 UTC (permalink / raw)
  To: Jeff Kirsher, davem; +Cc: Colin Ian King, netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180424192911.22786-2-jeffrey.t.kirsher@intel.com>

Hello!

On 4/24/2018 10:29 PM, Jeff Kirsher wrote:

> From: Colin Ian King <colin.king@canonical.com>
> 
> The current error handling for failed resource setup for xdp_ring
> data is a break out of the loop and returning 0 indicated everything
> was OK, when in fact it is not.  Fix this by exiting via the
> error exit label err_setup_tx that will clean up the resources
> correctly and return and error status.

    s/and/an/&

> Detected by CoverityScan, CID#1466879 ("Logically dead code")
> 
> Fixes: 21092e9ce8b1 ("ixgbevf: Add support for XDP_TX action")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: Allow bpf_jit_enable = 2 with BPF_JIT_ALWAYS_ON config
From: Leo Yan @ 2018-04-25  9:25 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David S. Miller, Alexei Starovoitov, Kirill Tkhai, netdev,
	linux-kernel
In-Reply-To: <a90e398e-c012-2630-6909-6d413e03cc96@iogearbox.net>

Hi Daniel,

On Wed, Apr 25, 2018 at 11:12:21AM +0200, Daniel Borkmann wrote:
> On 04/25/2018 10:18 AM, Leo Yan wrote:
> > After enabled BPF_JIT_ALWAYS_ON config, bpf_jit_enable always equals to
> > 1; it is impossible to set 'bpf_jit_enable = 2' and the kernel has no
> > chance to call bpf_jit_dump().
> > 
> > This patch relaxes bpf_jit_enable range to [1..2] when kernel config
> > BPF_JIT_ALWAYS_ON is enabled so can invoke jit dump.
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> 
> Is there a specific reason why you need this here instead of retrieving the
> dump from the newer interface available from bpftool (tools/bpf/bpftool/)?
> The bpf_jit_enable = 2 is not recommended these days since it dumps into the
> kernel log which is often readable from unpriv as well. bpftool makes use
> of the BPF_OBJ_GET_INFO_BY_FD interface via bpf syscall to get the JIT dump
> instead when bpf_jit_enable is set.

Thanks for reviewing.

When I read the doc Documentation/networking/filter.txt and the
section "JIT compiler" it suggests as below.  So I tried to set
'bpf_jit_enable = 2' to dump JIT code, but it failed.

If we have concern for security issue, should we remove support for
'bpf_jit_enable = 2' and modify the doc to reflect this change?

---8<---

For JIT developers, doing audits etc, each compile run can output the generated
opcode image into the kernel log via: 

  echo 2 > /proc/sys/net/core/bpf_jit_enable

Example output from dmesg:

[ 3389.935842] flen=6 proglen=70 pass=3 image=ffffffffa0069c8f
[ 3389.935847] JIT code: 00000000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 68
[ 3389.935849] JIT code: 00000010: 44 2b 4f 6c 4c 8b 87 d8 00 00 00 be 0c 00 00 00
[ 3389.935850] JIT code: 00000020: e8 1d 94 ff e0 3d 00 08 00 00 75 16 be 17 00 00
[ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
[ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3

> > ---
> >  net/core/sysctl_net_core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> > index b1a2c5e..6a39b22 100644
> > --- a/net/core/sysctl_net_core.c
> > +++ b/net/core/sysctl_net_core.c
> > @@ -371,7 +371,7 @@ static int proc_dointvec_minmax_bpf_enable(struct ctl_table *table, int write,
> >  		.proc_handler	= proc_dointvec_minmax_bpf_enable,
> >  # ifdef CONFIG_BPF_JIT_ALWAYS_ON
> >  		.extra1		= &one,
> > -		.extra2		= &one,
> > +		.extra2		= &two,
> >  # else
> >  		.extra1		= &zero,
> >  		.extra2		= &two,
> > 
> 

^ permalink raw reply

* Re: [net 2/6] igb: Fix the transmission mode of queue 0 for Qav mode
From: Sergei Shtylyov @ 2018-04-25  9:27 UTC (permalink / raw)
  To: Jeff Kirsher, davem
  Cc: Vinicius Costa Gomes, netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180424192911.22786-3-jeffrey.t.kirsher@intel.com>

On 4/24/2018 10:29 PM, Jeff Kirsher wrote:

> From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> 
> When Qav mode is enabled, queue 0 should be kept on Stream Reservation

    s/on/in/?

> mode. From the i210 datasheet, section 8.12.19:
> 
> "Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to
> Qav." ("QueueMode 1b" represents the Stream Reservation mode)
> 
> The solution is to give queue 0 the all the credits it might need, so
> it has priority over queue 1.
> 
> A situation where this can happen is when cbs is "installed" only on
> queue 1, leaving queue 0 alone. For example:
> 
> $ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \
>       	   map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
> 
> $ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \
>       	   hicredit 30 sendslope -980000 idleslope 20000 offload 1
> 
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH net-next 3/4] nfp: flower: support offloading multiple rules with same cookie
From: John Hurley @ 2018-04-25  9:27 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jakub Kicinski, David Miller, Linux Netdev List, oss-drivers,
	ASAP_Direct_Dev
In-Reply-To: <CAJ3xEMg0VPbM1bZG-fiNkpgabWBVw2U6uibrXxz+dMaOophVWA@mail.gmail.com>

On Wed, Apr 25, 2018 at 10:13 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Wed, Apr 25, 2018 at 12:02 PM, John Hurley <john.hurley@netronome.com> wrote:
>> On Wed, Apr 25, 2018 at 9:56 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Wed, Apr 25, 2018 at 11:51 AM, John Hurley <john.hurley@netronome.com> wrote:
>>>> On Wed, Apr 25, 2018 at 7:31 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>> On Wed, Apr 25, 2018 at 7:17 AM, Jakub Kicinski
>>>>> <jakub.kicinski@netronome.com> wrote:
>>>>>> From: John Hurley <john.hurley@netronome.com>
>>>>>>
>>>>>> When multiple netdevs are attached to a tc offload block and register for
>>>>>> callbacks, a rule added to the block will be propogated to all netdevs.
>>>>>> Previously these were detected as duplicates (based on cookie) and
>>>>>> rejected. Modify the rule nfp lookup function to optionally include an
>>>>>> ingress netdev and a host context along with the cookie value when
>>>>>> searching for a rule. When a new rule is passed to the driver, the netdev
>>>>>> the rule is to be attached to is considered when searching for dublicates.
>>>>>
>>>>> so if the same rule (cookie) is provided to the driver through multiple ingress
>>>>> devices you will not reject it -- what is the use case for that, is it
>>>>> block sharing?
>>>>
>>>> Hi Or,
>>>> Yes, block sharing is the current use-case.
>>>> Simple example for clarity....
>>>> Here we want to offload the filter to both ingress devs nfp_0 and nfp_1:
>>>>
>>>> tc qdisc add dev nfp_p0 ingress_block 22 ingress
>>>> tc qdisc add dev nfp_p1 ingress_block 22 ingress
>>>> tc filter add block 22 protocol ip parent ffff: flower skip_sw
>>>> ip_proto tcp action drop
>>>
>>> cool!
>>>
>>> Just out of curiosity, do you actually share this HW rule or you duplicate it?
>>
>> It's duplicated. At HW level the ingress port is part of the match so technically it's
>> a different rule.
>
> I see, we have also a match on the ingress port as part of the HW API, which
> means we will have to apply a similar practice if we want to support
> block sharing quickly.
>
> Just to make sure, under tc block sharing the tc stack calls for hw
> offloading of the
> same rule (same cookie) multiple times, each with different ingress
> device, right?
>
>
> Or.

So in the example above, when each qdisc add is called, a callback
will be registered to the block.
For each callback, the dev used is passed as priv data (presumably you
do similar).
When the filter is added, the block code triggers all callbacks with
the same rule data [1].
We differentiate the callbacks with the priv data (ingress dev).

[1] https://elixir.bootlin.com/linux/v4.17-rc2/source/net/sched/cls_api.c#L741

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox