* Re: general protection fault in dev_map_hash_update_elem
From: Alexei Starovoitov @ 2019-09-05 21:44 UTC (permalink / raw)
To: syzbot
Cc: bpf, Daniel Borkmann, Jesper Dangaard Brouer, LKML,
Network Development, syzkaller-bugs,
Toke Høiland-Jørgensen
In-Reply-To: <0000000000005091a70591d3e1d9@google.com>
On Thu, Sep 5, 2019 at 1:08 PM syzbot
<syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: 6d028043 Add linux-next specific files for 20190830
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=135c1a92600000
> kernel config: https://syzkaller.appspot.com/x/.config?x=82a6bec43ab0cb69
> dashboard link: https://syzkaller.appspot.com/bug?extid=4e7a85b1432052e8d6f8
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=109124e1600000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] PREEMPT SMP KASAN
> CPU: 1 PID: 10235 Comm: syz-executor.0 Not tainted 5.3.0-rc6-next-20190830
> #75
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:__write_once_size include/linux/compiler.h:203 [inline]
> RIP: 0010:__hlist_del include/linux/list.h:795 [inline]
> RIP: 0010:hlist_del_rcu include/linux/rculist.h:475 [inline]
> RIP: 0010:__dev_map_hash_update_elem kernel/bpf/devmap.c:668 [inline]
> RIP: 0010:dev_map_hash_update_elem+0x3c8/0x6e0 kernel/bpf/devmap.c:691
> Code: 48 89 f1 48 89 75 c8 48 c1 e9 03 80 3c 11 00 0f 85 d3 02 00 00 48 b9
> 00 00 00 00 00 fc ff df 48 8b 53 10 48 89 d6 48 c1 ee 03 <80> 3c 0e 00 0f
> 85 97 02 00 00 48 85 c0 48 89 02 74 38 48 89 55 b8
> RSP: 0018:ffff88808d607c30 EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ffff8880a7f14580 RCX: dffffc0000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a7f14588
> RBP: ffff88808d607c78 R08: 0000000000000004 R09: ffffed1011ac0f73
> R10: ffffed1011ac0f72 R11: 0000000000000003 R12: ffff88809f4e9400
> R13: ffff88809b06ba00 R14: 0000000000000000 R15: ffff88809f4e9528
> FS: 00007f3a3d50c700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007feb3fcd0000 CR3: 00000000986b9000 CR4: 00000000001406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> map_update_elem+0xc82/0x10b0 kernel/bpf/syscall.c:966
> __do_sys_bpf+0x8b5/0x3350 kernel/bpf/syscall.c:2854
> __se_sys_bpf kernel/bpf/syscall.c:2825 [inline]
> __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:2825
> do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x459879
> Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007f3a3d50bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000459879
> RDX: 0000000000000020 RSI: 0000000020000040 RDI: 0000000000000002
> RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3a3d50c6d4
> R13: 00000000004bfc86 R14: 00000000004d1960 R15: 00000000ffffffff
> Modules linked in:
> ---[ end trace 083223e21dbd0ae5 ]---
> RIP: 0010:__write_once_size include/linux/compiler.h:203 [inline]
> RIP: 0010:__hlist_del include/linux/list.h:795 [inline]
> RIP: 0010:hlist_del_rcu include/linux/rculist.h:475 [inline]
> RIP: 0010:__dev_map_hash_update_elem kernel/bpf/devmap.c:668 [inline]
> RIP: 0010:dev_map_hash_update_elem+0x3c8/0x6e0 kernel/bpf/devmap.c:691
Toke,
please take a look.
Thanks!
^ permalink raw reply
* Re: [PATCH net-next v4 1/1] net: openvswitch: Set OvS recirc_id from tc chain index
From: Pravin Shelar @ 2019-09-05 21:48 UTC (permalink / raw)
To: Paul Blakey
Cc: Linux Kernel Network Developers, David S. Miller, Justin Pettit,
Simon Horman, Marcelo Ricardo Leitner, Vlad Buslov, Jiri Pirko,
Roi Dayan, Yossi Kuperman, Rony Efraim, Oz Shlomo
In-Reply-To: <1567605397-14060-2-git-send-email-paulb@mellanox.com>
On Wed, Sep 4, 2019 at 6:56 AM Paul Blakey <paulb@mellanox.com> wrote:
>
> Offloaded OvS datapath rules are translated one to one to tc rules,
> for example the following simplified OvS rule:
>
> recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
>
> Will be translated to the following tc rule:
>
> $ tc filter add dev dev1 ingress \
> prio 1 chain 0 proto ip \
> flower tcp ct_state -trk \
> action ct pipe \
> action goto chain 2
>
> Received packets will first travel though tc, and if they aren't stolen
> by it, like in the above rule, they will continue to OvS datapath.
> Since we already did some actions (action ct in this case) which might
> modify the packets, and updated action stats, we would like to continue
> the proccessing with the correct recirc_id in OvS (here recirc_id(2))
> where we left off.
>
> To support this, introduce a new skb extension for tc, which
> will be used for translating tc chain to ovs recirc_id to
> handle these miss cases. Last tc chain index will be set
> by tc goto chain action and read by OvS datapath.
>
> Signed-off-by: Paul Blakey <paulb@mellanox.com>
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
Looks good to me.
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Thanks,
Pravin.
^ permalink raw reply
* Re: [PATCH net] net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list
From: Alexander Duyck @ 2019-09-05 21:49 UTC (permalink / raw)
To: Shmulik Ladkani
Cc: Daniel Borkmann, Eric Dumazet, Willem de Bruijn, eyal, netdev,
Shmulik Ladkani
In-Reply-To: <20190905183633.8144-1-shmulik.ladkani@gmail.com>
On Thu, Sep 5, 2019 at 11:36 AM Shmulik Ladkani
<shmulik@metanetworks.com> wrote:
>
> Historically, support for frag_list packets entering skb_segment() was
> limited to frag_list members terminating on exact same gso_size
> boundaries. This is verified with a BUG_ON since commit 89319d3801d1
> ("net: Add frag_list support to skb_segment"), quote:
>
> As such we require all frag_list members terminate on exact MSS
> boundaries. This is checked using BUG_ON.
> As there should only be one producer in the kernel of such packets,
> namely GRO, this requirement should not be difficult to maintain.
>
> However, since commit 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper"),
> the "exact MSS boundaries" assumption no longer holds:
> An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but
> leaves the frag_list members as originally merged by GRO with the
> original 'gso_size'. Example of such programs are bpf-based NAT46 or
> NAT64.
>
> This lead to a kernel BUG_ON for flows involving:
> - GRO generating a frag_list skb
> - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room()
> - skb_segment() of the skb
>
> See example BUG_ON reports in [0].
>
> In commit 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb"),
> skb_segment() was modified to support the "gso_size mangling" case of
> a frag_list GRO'ed skb, but *only* for frag_list members having
> head_frag==true (having a page-fragment head).
>
> Alas, GRO packets having frag_list members with a linear kmalloced head
> (head_frag==false) still hit the BUG_ON.
>
> This commit adds support to skb_segment() for a 'head_skb' packet having
> a frag_list whose members are *non* head_frag, with gso_size mangled, by
> disabling SG and thus falling-back to copying the data from the given
> 'head_skb' into the generated segmented skbs - as suggested by Willem de
> Bruijn [1].
>
> Since this approach involves the penalty of skb_copy_and_csum_bits()
> when building the segments, care was taken in order to enable this
> solution only when required:
> - untrusted gso_size, by testing SKB_GSO_DODGY is set
> (SKB_GSO_DODGY is set by any gso_size mangling functions in
> net/core/filter.c)
> - the frag_list is non empty, its item is a non head_frag, *and* the
> headlen of the given 'head_skb' does not match the gso_size.
>
> [0]
> https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/
> https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/
>
> [1]
> https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/
>
> Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> ---
> net/core/skbuff.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index ea8e8d332d85..c4bd1881acff 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3678,6 +3678,24 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> sg = !!(features & NETIF_F_SG);
> csum = !!can_checksum_protocol(features, proto);
>
> + if (mss != GSO_BY_FRAGS &&
> + (skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY)) {
> + /* gso_size is untrusted.
> + *
> + * If head_skb has a frag_list with a linear non head_frag
> + * item, and head_skb's headlen does not fit requested
> + * gso_size, fall back to copying the skbs - by disabling sg.
> + *
> + * We assume checking the first frag suffices, i.e if either of
> + * the frags have non head_frag data, then the first frag is
> + * too.
> + */
> + if (list_skb && skb_headlen(list_skb) && !list_skb->head_frag &&
> + (mss != skb_headlen(head_skb) - doffset)) {
> + sg = false;
> + }
> + }
> +
I would change the order of the tests you use here so that we can
eliminate the possibility of needing to perform many tests for the
more common cases. You could probably swap "list_skb" and "mss !=
GSO_BY_FRAGS" since list_skb is more likely to be false for many of
the common cases such as a standard TSO send from a socket. You might
even consider moving the GSO_BY_FRAGS check toward the end of your
checks since SCTP is the only protocol that I believe uses it and the
likelihood of encountering it is much lower compared to other
protocols.
You could probably test for !list_skb->head_frag before seeing if
there is a headlen since many NICs would be generating frames using
head_frag, so in the GRO case you mentioned above it could probably
save you some effort on a number of NICs.
You might also consider moving this code up before we push the mac
header back on and instead of setting sg to false you could just clear
the NETIF_F_SG flag from features. It would save you from having to
then remove doffset in your last check.
> if (sg && csum && (mss != GSO_BY_FRAGS)) {
> if (!(features & NETIF_F_GSO_PARTIAL)) {
> struct sk_buff *iter;
> --
> 2.19.1
>
^ permalink raw reply
* [pull request][net-next 00/14] Mellanox, mlx5 cleanups & port congestion stats
From: Saeed Mahameed @ 2019-09-05 21:50 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev@vger.kernel.org, Saeed Mahameed
Hi Dave,
This series provides 12 mlx5 cleanup patches and last 2 patches provide
port congestion stats to ethtool.
For more information please see tag log below.
Please pull and let me know if there is any problem.
Thanks,
Saeed.
---
The following changes since commit 0e5b36bc4c1fccfc18dd851d960781589c16dae8:
r8152: adjust the settings of ups flags (2019-09-05 12:41:11 +0200)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2019-09-05
for you to fetch changes up to 1297d97f4862ad690d882ae5b0487e3d1ff15953:
net/mlx5e: Add port buffer's congestion counters (2019-09-05 14:44:43 -0700)
----------------------------------------------------------------
mlx5-updates-2019-09-05
1) Allover mlx5 cleanups
2) Added port congestion counters to ethtool stats:
Add 3 counters per priority to ethtool using PPCNT:
2.1) rx_prio[p]_buf_discard - the number of packets discarded by device
due to lack of per host receive buffers
2.2) rx_prio[p]_cong_discard - the number of packets discarded by device
due to per host congestion
2.3) rx_prio[p]_marked - the number of packets ECN marked by device due
to per host congestion
----------------------------------------------------------------
Aya Levin (2):
net/mlx5: Expose HW capability bits for port buffer per priority congestion counters
net/mlx5e: Add port buffer's congestion counters
Colin Ian King (2):
net/mlx5: fix spelling mistake "offlaods" -> "offloads"
net/mlx5: fix missing assignment of variable err
Eran Ben Elisha (1):
net/mlx5e: Fix static checker warning of potential pointer math issue
Mao Wenan (1):
net/mlx5: Kconfig: Fix MLX5_CORE dependency with PCI_HYPERV_INTERFACE
Maxim Mikityanskiy (1):
net/mlx5e: Remove unnecessary clear_bit()s
Roi Dayan (1):
net/mlx5e: Remove leftover declaration
Saeed Mahameed (2):
net/mlx5e: Use ipv6_stub to avoid dependency with ipv6 being a module
net/mlx5: DR, Remove redundant dev_name print from err log
Tariq Toukan (1):
net/mlx5e: kTLS, Remove unused function parameter
Wei Yongjun (2):
net/mlx5: DR, Remove useless set memory to zero use memset()
net/mlx5: DR, Fix error return code in dr_domain_init_resources()
zhong jiang (1):
net/mlx5: Use PTR_ERR_OR_ZERO rather than its implementation
drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 2 +-
.../ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c | 9 +-
.../ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 6 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 -
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 23 ++--
drivers/net/ethernet/mellanox/mlx5/core/en_rep.h | 1 -
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 149 ++++++++++++++++++++-
drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 2 +
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 7 +-
.../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +-
.../mellanox/mlx5/core/steering/dr_domain.c | 18 ++-
.../ethernet/mellanox/mlx5/core/steering/dr_send.c | 1 -
include/linux/mlx5/device.h | 1 +
include/linux/mlx5/mlx5_ifc.h | 29 +++-
15 files changed, 207 insertions(+), 47 deletions(-)
^ permalink raw reply
* [net-next 01/14] net/mlx5e: Fix static checker warning of potential pointer math issue
From: Saeed Mahameed @ 2019-09-05 21:50 UTC (permalink / raw)
To: David S. Miller
Cc: netdev@vger.kernel.org, Eran Ben Elisha, Dan Carpenter,
Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
From: Eran Ben Elisha <eranbe@mellanox.com>
Cited patch have an issue in WARN_ON_ONCE check, with wrong address ranges
are compared. Fix that by changing pointer types from u64* to void*. This
will also make code simpler to read.
In addition mlx5e_hv_vhca_fill_ring_stats can get void pointer, so remove
the unnecessary casting when calling it.
Found by static checker:
drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c:41 mlx5e_hv_vhca_fill_stats()
warn: potential pointer math issue ('buf' is a u64 pointer)
Fixes: cef35af34d6d ("net/mlx5e: Add mlx5e HV VHCA stats agent")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
.../net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
index c37b4acd9bd5..b3a249b2a482 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
@@ -30,22 +30,21 @@ mlx5e_hv_vhca_fill_ring_stats(struct mlx5e_priv *priv, int ch,
}
}
-static void mlx5e_hv_vhca_fill_stats(struct mlx5e_priv *priv, u64 *data,
+static void mlx5e_hv_vhca_fill_stats(struct mlx5e_priv *priv, void *data,
int buf_len)
{
int ch, i = 0;
for (ch = 0; ch < priv->max_nch; ch++) {
- u64 *buf = data + i;
+ void *buf = data + i;
if (WARN_ON_ONCE(buf +
sizeof(struct mlx5e_hv_vhca_per_ring_stats) >
data + buf_len))
return;
- mlx5e_hv_vhca_fill_ring_stats(priv, ch,
- (struct mlx5e_hv_vhca_per_ring_stats *)buf);
- i += sizeof(struct mlx5e_hv_vhca_per_ring_stats) / sizeof(u64);
+ mlx5e_hv_vhca_fill_ring_stats(priv, ch, buf);
+ i += sizeof(struct mlx5e_hv_vhca_per_ring_stats);
}
}
--
2.21.0
^ permalink raw reply related
* [net-next 02/14] net/mlx5: Kconfig: Fix MLX5_CORE dependency with PCI_HYPERV_INTERFACE
From: Saeed Mahameed @ 2019-09-05 21:50 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev@vger.kernel.org, Mao Wenan, Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
From: Mao Wenan <maowenan@huawei.com>
When MLX5_CORE=y and PCI_HYPERV_INTERFACE=m, below errors are found:
drivers/net/ethernet/mellanox/mlx5/core/en_main.o: In function `mlx5e_nic_enable':
en_main.c:(.text+0xb649): undefined reference to `mlx5e_hv_vhca_stats_create'
drivers/net/ethernet/mellanox/mlx5/core/en_main.o: In function `mlx5e_nic_disable':
en_main.c:(.text+0xb8c4): undefined reference to `mlx5e_hv_vhca_stats_destroy'
Fix this by making MLX5_CORE imply PCI_HYPERV_INTERFACE.
Fixes: cef35af34d6d ("net/mlx5e: Add mlx5e HV VHCA stats agent")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 0d8dd885b7d6..a496f2ac20b0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -10,6 +10,7 @@ config MLX5_CORE
imply PTP_1588_CLOCK
imply VXLAN
imply MLXFW
+ imply PCI_HYPERV_INTERFACE
default n
---help---
Core driver for low level functionality of the ConnectX-4 and
--
2.21.0
^ permalink raw reply related
* [net-next 04/14] net/mlx5e: Remove leftover declaration
From: Saeed Mahameed @ 2019-09-05 21:51 UTC (permalink / raw)
To: David S. Miller
Cc: netdev@vger.kernel.org, Roi Dayan, Vlad Buslov, Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
From: Roi Dayan <roid@mellanox.com>
This function was removed in the cited commit below.
Fixes: 13e509a4c194 ("net/mlx5e: Remove leftover code from the PF netdev being uplink rep")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rep.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
index 8e512216deb8..31f83c8adcc9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
@@ -183,7 +183,6 @@ struct mlx5e_rep_sq {
struct list_head list;
};
-void *mlx5e_alloc_nic_rep_priv(struct mlx5_core_dev *mdev);
void mlx5e_rep_register_vport_reps(struct mlx5_core_dev *mdev);
void mlx5e_rep_unregister_vport_reps(struct mlx5_core_dev *mdev);
bool mlx5e_is_uplink_rep(struct mlx5e_priv *priv);
--
2.21.0
^ permalink raw reply related
* [net-next 03/14] net/mlx5e: Use ipv6_stub to avoid dependency with ipv6 being a module
From: Saeed Mahameed @ 2019-09-05 21:50 UTC (permalink / raw)
To: David S. Miller
Cc: netdev@vger.kernel.org, Saeed Mahameed, Walter Harms, Mark Bloch,
Vlad Buslov
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
mlx5 is dependent on IPv6 tristate since we use ipv6's nd_tbl directly,
alternatively we can use ipv6_stub->nd_tbl and remove the dependency.
Reported-by: Walter Harms <wharms@bfs.de>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
.../net/ethernet/mellanox/mlx5/core/Kconfig | 1 -
.../net/ethernet/mellanox/mlx5/core/en_rep.c | 23 +++++++++++--------
.../net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +-
3 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index a496f2ac20b0..0dba272a5b2f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -33,7 +33,6 @@ config MLX5_FPGA
config MLX5_CORE_EN
bool "Mellanox 5th generation network adapters (ConnectX series) Ethernet support"
depends on NETDEVICES && ETHERNET && INET && PCI && MLX5_CORE
- depends on IPV6=y || IPV6=n || MLX5_CORE=m
select PAGE_POOL
select DIMLIB
default n
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 1623cd32f303..95892a3b63a1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -38,6 +38,7 @@
#include <net/netevent.h>
#include <net/arp.h>
#include <net/devlink.h>
+#include <net/ipv6_stubs.h>
#include "eswitch.h"
#include "en.h"
@@ -499,16 +500,18 @@ void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv)
mlx5e_sqs2vport_stop(esw, rep);
}
+static unsigned long mlx5e_rep_ipv6_interval(void)
+{
+ if (IS_ENABLED(CONFIG_IPV6) && ipv6_stub->nd_tbl)
+ return NEIGH_VAR(&ipv6_stub->nd_tbl->parms, DELAY_PROBE_TIME);
+
+ return ~0UL;
+}
+
static void mlx5e_rep_neigh_update_init_interval(struct mlx5e_rep_priv *rpriv)
{
-#if IS_ENABLED(CONFIG_IPV6)
- unsigned long ipv6_interval = NEIGH_VAR(&nd_tbl.parms,
- DELAY_PROBE_TIME);
-#else
- unsigned long ipv6_interval = ~0UL;
-#endif
- unsigned long ipv4_interval = NEIGH_VAR(&arp_tbl.parms,
- DELAY_PROBE_TIME);
+ unsigned long ipv4_interval = NEIGH_VAR(&arp_tbl.parms, DELAY_PROBE_TIME);
+ unsigned long ipv6_interval = mlx5e_rep_ipv6_interval();
struct net_device *netdev = rpriv->netdev;
struct mlx5e_priv *priv = netdev_priv(netdev);
@@ -917,7 +920,7 @@ static int mlx5e_rep_netevent_event(struct notifier_block *nb,
case NETEVENT_NEIGH_UPDATE:
n = ptr;
#if IS_ENABLED(CONFIG_IPV6)
- if (n->tbl != &nd_tbl && n->tbl != &arp_tbl)
+ if (n->tbl != ipv6_stub->nd_tbl && n->tbl != &arp_tbl)
#else
if (n->tbl != &arp_tbl)
#endif
@@ -944,7 +947,7 @@ static int mlx5e_rep_netevent_event(struct notifier_block *nb,
* done per device delay prob time parameter.
*/
#if IS_ENABLED(CONFIG_IPV6)
- if (!p->dev || (p->tbl != &nd_tbl && p->tbl != &arp_tbl))
+ if (!p->dev || (p->tbl != ipv6_stub->nd_tbl && p->tbl != &arp_tbl))
#else
if (!p->dev || p->tbl != &arp_tbl)
#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 30d26eba75a3..98d1f7a48304 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1492,7 +1492,7 @@ void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe)
tbl = &arp_tbl;
#if IS_ENABLED(CONFIG_IPV6)
else if (m_neigh->family == AF_INET6)
- tbl = &nd_tbl;
+ tbl = ipv6_stub->nd_tbl;
#endif
else
return;
--
2.21.0
^ permalink raw reply related
* [net-next 05/14] net/mlx5: fix spelling mistake "offlaods" -> "offloads"
From: Saeed Mahameed @ 2019-09-05 21:51 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev@vger.kernel.org, Colin Ian King, Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
From: Colin Ian King <colin.king@canonical.com>
There is a spelling mistake in a NL_SET_ERR_MSG_MOD error message.
Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 7bf7b6fbc776..381925c90d94 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -133,7 +133,7 @@ static int mlx5_devlink_fs_mode_validate(struct devlink *devlink, u32 id,
else if (eswitch_mode == MLX5_ESWITCH_OFFLOADS) {
NL_SET_ERR_MSG_MOD(extack,
- "Software managed steering is not supported when eswitch offlaods enabled.");
+ "Software managed steering is not supported when eswitch offloads enabled.");
err = -EOPNOTSUPP;
}
} else {
--
2.21.0
^ permalink raw reply related
* [net-next 06/14] net/mlx5: fix missing assignment of variable err
From: Saeed Mahameed @ 2019-09-05 21:51 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev@vger.kernel.org, Colin Ian King, Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
From: Colin Ian King <colin.king@canonical.com>
The error return from a call to mlx5_flow_namespace_set_peer is not
being assigned to variable err and hence the error check following
the call is currently not working. Fix this by assigning ret as
intended.
Addresses-Coverity: ("Logically dead code")
Fixes: 8463daf17e80 ("net/mlx5: Add support to use SMFS in switchdev mode")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index afa623b15a38..00d71db15f22 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -1651,7 +1651,7 @@ static int mlx5_esw_offloads_set_ns_peer(struct mlx5_eswitch *esw,
if (err)
return err;
- mlx5_flow_namespace_set_peer(peer_ns, ns);
+ err = mlx5_flow_namespace_set_peer(peer_ns, ns);
if (err) {
mlx5_flow_namespace_set_peer(ns, NULL);
return err;
--
2.21.0
^ permalink raw reply related
* [net-next 07/14] net/mlx5: Use PTR_ERR_OR_ZERO rather than its implementation
From: Saeed Mahameed @ 2019-09-05 21:51 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev@vger.kernel.org, zhong jiang, Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
From: zhong jiang <zhongjiang@huawei.com>
PTR_ERR_OR_ZERO contains if(IS_ERR(...)) + PTR_ERR. It is better
to use it directly. hence just replace it.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 98d1f7a48304..da7555fdb4d5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -988,10 +988,7 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
&flow_act, dest, dest_ix);
mutex_unlock(&priv->fs.tc.t_lock);
- if (IS_ERR(flow->rule[0]))
- return PTR_ERR(flow->rule[0]);
-
- return 0;
+ return PTR_ERR_OR_ZERO(flow->rule[0]);
}
static void mlx5e_tc_del_nic_flow(struct mlx5e_priv *priv,
--
2.21.0
^ permalink raw reply related
* [net-next 08/14] net/mlx5e: kTLS, Remove unused function parameter
From: Saeed Mahameed @ 2019-09-05 21:51 UTC (permalink / raw)
To: David S. Miller
Cc: netdev@vger.kernel.org, Tariq Toukan, Eran Ben Elisha,
Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
From: Tariq Toukan <tariqt@mellanox.com>
SKB parameter is no longer used in tx_post_resync_dump(),
remove it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
index e5222d17df35..d195366461c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
@@ -256,8 +256,7 @@ struct mlx5e_dump_wqe {
};
static int
-tx_post_resync_dump(struct mlx5e_txqsq *sq, struct sk_buff *skb,
- skb_frag_t *frag, u32 tisn, bool first)
+tx_post_resync_dump(struct mlx5e_txqsq *sq, skb_frag_t *frag, u32 tisn, bool first)
{
struct mlx5_wqe_ctrl_seg *cseg;
struct mlx5_wqe_data_seg *dseg;
@@ -371,8 +370,7 @@ mlx5e_ktls_tx_handle_ooo(struct mlx5e_ktls_offload_context_tx *priv_tx,
tx_post_resync_params(sq, priv_tx, info.rcd_sn);
for (i = 0; i < info.nr_frags; i++)
- if (tx_post_resync_dump(sq, skb, info.frags[i],
- priv_tx->tisn, !i))
+ if (tx_post_resync_dump(sq, info.frags[i], priv_tx->tisn, !i))
goto err_out;
/* If no dump WQE was sent, we need to have a fence NOP WQE before the
--
2.21.0
^ permalink raw reply related
* [net-next 10/14] net/mlx5: DR, Remove useless set memory to zero use memset()
From: Saeed Mahameed @ 2019-09-05 21:51 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev@vger.kernel.org, Wei Yongjun, Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
From: Wei Yongjun <weiyongjun1@huawei.com>
The memory return by kzalloc() has already be set to zero, so
remove useless memset(0).
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
index ef0dea44f3b3..5df8436b2ae3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
@@ -899,7 +899,6 @@ int mlx5dr_send_ring_alloc(struct mlx5dr_domain *dmn)
goto clean_qp;
}
- memset(dmn->send_ring->buf, 0, size);
dmn->send_ring->buf_size = size;
dmn->send_ring->mr = dr_reg_mr(dmn->mdev,
--
2.21.0
^ permalink raw reply related
* [net-next 11/14] net/mlx5: DR, Fix error return code in dr_domain_init_resources()
From: Saeed Mahameed @ 2019-09-05 21:51 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev@vger.kernel.org, Wei Yongjun, Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
From: Wei Yongjun <weiyongjun1@huawei.com>
Fix to return negative error code -ENOMEM from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 4ec9e7b02697 ("net/mlx5: DR, Expose steering domain functionality")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
index 3b9cf0bccf4d..461cc2c30538 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
@@ -66,6 +66,7 @@ static int dr_domain_init_resources(struct mlx5dr_domain *dmn)
dmn->uar = mlx5_get_uars_page(dmn->mdev);
if (!dmn->uar) {
mlx5dr_err(dmn, "Couldn't allocate UAR\n");
+ ret = -ENOMEM;
goto clean_pd;
}
@@ -73,6 +74,7 @@ static int dr_domain_init_resources(struct mlx5dr_domain *dmn)
if (!dmn->ste_icm_pool) {
mlx5dr_err(dmn, "Couldn't get icm memory for %s\n",
dev_name(dmn->mdev->device));
+ ret = -ENOMEM;
goto clean_uar;
}
@@ -80,6 +82,7 @@ static int dr_domain_init_resources(struct mlx5dr_domain *dmn)
if (!dmn->action_icm_pool) {
mlx5dr_err(dmn, "Couldn't get action icm memory for %s\n",
dev_name(dmn->mdev->device));
+ ret = -ENOMEM;
goto free_ste_icm_pool;
}
--
2.21.0
^ permalink raw reply related
* [net-next 09/14] net/mlx5e: Remove unnecessary clear_bit()s
From: Saeed Mahameed @ 2019-09-05 21:51 UTC (permalink / raw)
To: David S. Miller
Cc: netdev@vger.kernel.org, Maxim Mikityanskiy, Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
From: Maxim Mikityanskiy <maximmi@mellanox.com>
Don't clear MLX5E_SQ_STATE_ENABLED on error in mlx5e_open_txqsq and
mlx5e_open_icosq, because it's not set there, and is 0 by default.
Fixes: acc6c5953af1 ("net/mlx5e: Split open/close channels to stages")
Fixes: 9d18b5144a0a ("net/mlx5e: Split open/close ICOSQ into stages")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index dadadf221087..cd51cd56484b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1315,7 +1315,6 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c,
return 0;
err_free_txqsq:
- clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
mlx5e_free_txqsq(sq);
return err;
@@ -1403,7 +1402,6 @@ int mlx5e_open_icosq(struct mlx5e_channel *c, struct mlx5e_params *params,
return 0;
err_free_icosq:
- clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
mlx5e_free_icosq(sq);
return err;
--
2.21.0
^ permalink raw reply related
* [net-next 12/14] net/mlx5: DR, Remove redundant dev_name print from err log
From: Saeed Mahameed @ 2019-09-05 21:51 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev@vger.kernel.org, Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
mlx5_core_err already prints the name of the device.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
.../mellanox/mlx5/core/steering/dr_domain.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
index 461cc2c30538..5b24732b18c0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
@@ -72,24 +72,21 @@ static int dr_domain_init_resources(struct mlx5dr_domain *dmn)
dmn->ste_icm_pool = mlx5dr_icm_pool_create(dmn, DR_ICM_TYPE_STE);
if (!dmn->ste_icm_pool) {
- mlx5dr_err(dmn, "Couldn't get icm memory for %s\n",
- dev_name(dmn->mdev->device));
+ mlx5dr_err(dmn, "Couldn't get icm memory\n");
ret = -ENOMEM;
goto clean_uar;
}
dmn->action_icm_pool = mlx5dr_icm_pool_create(dmn, DR_ICM_TYPE_MODIFY_ACTION);
if (!dmn->action_icm_pool) {
- mlx5dr_err(dmn, "Couldn't get action icm memory for %s\n",
- dev_name(dmn->mdev->device));
+ mlx5dr_err(dmn, "Couldn't get action icm memory\n");
ret = -ENOMEM;
goto free_ste_icm_pool;
}
ret = mlx5dr_send_ring_alloc(dmn);
if (ret) {
- mlx5dr_err(dmn, "Couldn't create send-ring for %s\n",
- dev_name(dmn->mdev->device));
+ mlx5dr_err(dmn, "Couldn't create send-ring\n");
goto free_action_icm_pool;
}
@@ -312,16 +309,14 @@ mlx5dr_domain_create(struct mlx5_core_dev *mdev, enum mlx5dr_domain_type type)
dmn->info.caps.log_icm_size);
if (!dmn->info.supp_sw_steering) {
- mlx5dr_err(dmn, "SW steering not supported for %s\n",
- dev_name(mdev->device));
+ mlx5dr_err(dmn, "SW steering is not supported\n");
goto uninit_caps;
}
/* Allocate resources */
ret = dr_domain_init_resources(dmn);
if (ret) {
- mlx5dr_err(dmn, "Failed init domain resources for %s\n",
- dev_name(mdev->device));
+ mlx5dr_err(dmn, "Failed init domain resources\n");
goto uninit_caps;
}
--
2.21.0
^ permalink raw reply related
* [net-next 13/14] net/mlx5: Expose HW capability bits for port buffer per priority congestion counters
From: Saeed Mahameed @ 2019-09-05 21:51 UTC (permalink / raw)
To: David S. Miller
Cc: netdev@vger.kernel.org, Aya Levin, Moshe Shemesh, Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
From: Aya Levin <ayal@mellanox.com>
Map capability bit indicating that HCA supports port buffer's congestion
counters. Also map registers with the corresponding counters.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
include/linux/mlx5/device.h | 1 +
include/linux/mlx5/mlx5_ifc.h | 29 ++++++++++++++++++++++++-----
2 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 8dd081051a79..f3773e8536bb 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1316,6 +1316,7 @@ enum {
MLX5_PER_PRIORITY_COUNTERS_GROUP = 0x10,
MLX5_PER_TRAFFIC_CLASS_COUNTERS_GROUP = 0x11,
MLX5_PHYSICAL_LAYER_COUNTERS_GROUP = 0x12,
+ MLX5_PER_TRAFFIC_CLASS_CONGESTION_GROUP = 0x13,
MLX5_PHYSICAL_LAYER_STATISTICAL_GROUP = 0x16,
MLX5_INFINIBAND_PORT_COUNTERS_GROUP = 0x20,
};
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 7d65c0578ac9..a487b681b516 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1196,7 +1196,8 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 rts2rts_qp_counters_set_id[0x1];
u8 reserved_at_16a[0x2];
u8 vnic_env_int_rq_oob[0x1];
- u8 reserved_at_16d[0x2];
+ u8 sbcam_reg[0x1];
+ u8 reserved_at_16e[0x1];
u8 qcam_reg[0x1];
u8 gid_table_size[0x10];
@@ -1960,12 +1961,28 @@ struct mlx5_ifc_ib_port_cntrs_grp_data_layout_bits {
u8 port_xmit_wait[0x20];
};
-struct mlx5_ifc_eth_per_traffic_grp_data_layout_bits {
+struct mlx5_ifc_eth_per_tc_prio_grp_data_layout_bits {
u8 transmit_queue_high[0x20];
u8 transmit_queue_low[0x20];
- u8 reserved_at_40[0x780];
+ u8 no_buffer_discard_uc_high[0x20];
+
+ u8 no_buffer_discard_uc_low[0x20];
+
+ u8 reserved_at_80[0x740];
+};
+
+struct mlx5_ifc_eth_per_tc_congest_prio_grp_data_layout_bits {
+ u8 wred_discard_high[0x20];
+
+ u8 wred_discard_low[0x20];
+
+ u8 ecn_marked_tc_high[0x20];
+
+ u8 ecn_marked_tc_low[0x20];
+
+ u8 reserved_at_80[0x740];
};
struct mlx5_ifc_eth_per_prio_grp_data_layout_bits {
@@ -3642,7 +3659,8 @@ union mlx5_ifc_eth_cntrs_grp_data_layout_auto_bits {
struct mlx5_ifc_eth_3635_cntrs_grp_data_layout_bits eth_3635_cntrs_grp_data_layout;
struct mlx5_ifc_eth_extended_cntrs_grp_data_layout_bits eth_extended_cntrs_grp_data_layout;
struct mlx5_ifc_eth_per_prio_grp_data_layout_bits eth_per_prio_grp_data_layout;
- struct mlx5_ifc_eth_per_traffic_grp_data_layout_bits eth_per_traffic_grp_data_layout;
+ struct mlx5_ifc_eth_per_tc_prio_grp_data_layout_bits eth_per_tc_prio_grp_data_layout;
+ struct mlx5_ifc_eth_per_tc_congest_prio_grp_data_layout_bits eth_per_tc_congest_prio_grp_data_layout;
struct mlx5_ifc_ib_port_cntrs_grp_data_layout_bits ib_port_cntrs_grp_data_layout;
struct mlx5_ifc_phys_layer_cntrs_bits phys_layer_cntrs;
struct mlx5_ifc_phys_layer_statistical_cntrs_bits phys_layer_statistical_cntrs;
@@ -9422,7 +9440,8 @@ union mlx5_ifc_ports_control_registers_document_bits {
struct mlx5_ifc_eth_802_3_cntrs_grp_data_layout_bits eth_802_3_cntrs_grp_data_layout;
struct mlx5_ifc_eth_extended_cntrs_grp_data_layout_bits eth_extended_cntrs_grp_data_layout;
struct mlx5_ifc_eth_per_prio_grp_data_layout_bits eth_per_prio_grp_data_layout;
- struct mlx5_ifc_eth_per_traffic_grp_data_layout_bits eth_per_traffic_grp_data_layout;
+ struct mlx5_ifc_eth_per_tc_prio_grp_data_layout_bits eth_per_tc_prio_grp_data_layout;
+ struct mlx5_ifc_eth_per_tc_congest_prio_grp_data_layout_bits eth_per_tc_congest_prio_grp_data_layout;
struct mlx5_ifc_lane_2_module_mapping_bits lane_2_module_mapping;
struct mlx5_ifc_pamp_reg_bits pamp_reg;
struct mlx5_ifc_paos_reg_bits paos_reg;
--
2.21.0
^ permalink raw reply related
* [net-next 14/14] net/mlx5e: Add port buffer's congestion counters
From: Saeed Mahameed @ 2019-09-05 21:51 UTC (permalink / raw)
To: David S. Miller
Cc: netdev@vger.kernel.org, Aya Levin, Moshe Shemesh, Saeed Mahameed
In-Reply-To: <20190905215034.22713-1-saeedm@mellanox.com>
From: Aya Levin <ayal@mellanox.com>
Add 3 counters per priority to ethtool using PPCNT:
1) rx_prio[p]_buf_discard - the number of packets discarded by device
due to lack of per host receive buffers
2) rx_prio[p]_cong_discard - the number of packets discarded by device
due to per host congestion
3) rx_prio[p]_marked - the number of packets ECN marked by device due
to per host congestion
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
.../ethernet/mellanox/mlx5/core/en_stats.c | 149 +++++++++++++++++-
.../ethernet/mellanox/mlx5/core/en_stats.h | 2 +
2 files changed, 150 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index f1065e78086a..ac6fdcda7019 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -981,6 +981,147 @@ static void mlx5e_grp_pcie_update_stats(struct mlx5e_priv *priv)
mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_MPCNT, 0, 0);
}
+#define PPORT_PER_TC_PRIO_OFF(c) \
+ MLX5_BYTE_OFF(ppcnt_reg, \
+ counter_set.eth_per_tc_prio_grp_data_layout.c##_high)
+
+static const struct counter_desc pport_per_tc_prio_stats_desc[] = {
+ { "rx_prio%d_buf_discard", PPORT_PER_TC_PRIO_OFF(no_buffer_discard_uc) },
+};
+
+#define NUM_PPORT_PER_TC_PRIO_COUNTERS ARRAY_SIZE(pport_per_tc_prio_stats_desc)
+
+#define PPORT_PER_TC_CONGEST_PRIO_OFF(c) \
+ MLX5_BYTE_OFF(ppcnt_reg, \
+ counter_set.eth_per_tc_congest_prio_grp_data_layout.c##_high)
+
+static const struct counter_desc pport_per_tc_congest_prio_stats_desc[] = {
+ { "rx_prio%d_cong_discard", PPORT_PER_TC_CONGEST_PRIO_OFF(wred_discard) },
+ { "rx_prio%d_marked", PPORT_PER_TC_CONGEST_PRIO_OFF(ecn_marked_tc) },
+};
+
+#define NUM_PPORT_PER_TC_CONGEST_PRIO_COUNTERS \
+ ARRAY_SIZE(pport_per_tc_congest_prio_stats_desc)
+
+static int mlx5e_grp_per_tc_prio_get_num_stats(struct mlx5e_priv *priv)
+{
+ struct mlx5_core_dev *mdev = priv->mdev;
+
+ if (!MLX5_CAP_GEN(mdev, sbcam_reg))
+ return 0;
+
+ return NUM_PPORT_PER_TC_PRIO_COUNTERS * NUM_PPORT_PRIO;
+}
+
+static int mlx5e_grp_per_port_buffer_congest_fill_strings(struct mlx5e_priv *priv,
+ u8 *data, int idx)
+{
+ struct mlx5_core_dev *mdev = priv->mdev;
+ int i, prio;
+
+ if (!MLX5_CAP_GEN(mdev, sbcam_reg))
+ return idx;
+
+ for (prio = 0; prio < NUM_PPORT_PRIO; prio++) {
+ for (i = 0; i < NUM_PPORT_PER_TC_PRIO_COUNTERS; i++)
+ sprintf(data + (idx++) * ETH_GSTRING_LEN,
+ pport_per_tc_prio_stats_desc[i].format, prio);
+ for (i = 0; i < NUM_PPORT_PER_TC_CONGEST_PRIO_COUNTERS; i++)
+ sprintf(data + (idx++) * ETH_GSTRING_LEN,
+ pport_per_tc_congest_prio_stats_desc[i].format, prio);
+ }
+
+ return idx;
+}
+
+static int mlx5e_grp_per_port_buffer_congest_fill_stats(struct mlx5e_priv *priv,
+ u64 *data, int idx)
+{
+ struct mlx5e_pport_stats *pport = &priv->stats.pport;
+ struct mlx5_core_dev *mdev = priv->mdev;
+ int i, prio;
+
+ if (!MLX5_CAP_GEN(mdev, sbcam_reg))
+ return idx;
+
+ for (prio = 0; prio < NUM_PPORT_PRIO; prio++) {
+ for (i = 0; i < NUM_PPORT_PER_TC_PRIO_COUNTERS; i++)
+ data[idx++] =
+ MLX5E_READ_CTR64_BE(&pport->per_tc_prio_counters[prio],
+ pport_per_tc_prio_stats_desc, i);
+ for (i = 0; i < NUM_PPORT_PER_TC_CONGEST_PRIO_COUNTERS ; i++)
+ data[idx++] =
+ MLX5E_READ_CTR64_BE(&pport->per_tc_congest_prio_counters[prio],
+ pport_per_tc_congest_prio_stats_desc, i);
+ }
+
+ return idx;
+}
+
+static void mlx5e_grp_per_tc_prio_update_stats(struct mlx5e_priv *priv)
+{
+ struct mlx5e_pport_stats *pstats = &priv->stats.pport;
+ struct mlx5_core_dev *mdev = priv->mdev;
+ u32 in[MLX5_ST_SZ_DW(ppcnt_reg)] = {};
+ int sz = MLX5_ST_SZ_BYTES(ppcnt_reg);
+ void *out;
+ int prio;
+
+ if (!MLX5_CAP_GEN(mdev, sbcam_reg))
+ return;
+
+ MLX5_SET(ppcnt_reg, in, pnat, 2);
+ MLX5_SET(ppcnt_reg, in, grp, MLX5_PER_TRAFFIC_CLASS_COUNTERS_GROUP);
+ for (prio = 0; prio < NUM_PPORT_PRIO; prio++) {
+ out = pstats->per_tc_prio_counters[prio];
+ MLX5_SET(ppcnt_reg, in, prio_tc, prio);
+ mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0);
+ }
+}
+
+static int mlx5e_grp_per_tc_congest_prio_get_num_stats(struct mlx5e_priv *priv)
+{
+ struct mlx5_core_dev *mdev = priv->mdev;
+
+ if (!MLX5_CAP_GEN(mdev, sbcam_reg))
+ return 0;
+
+ return NUM_PPORT_PER_TC_CONGEST_PRIO_COUNTERS * NUM_PPORT_PRIO;
+}
+
+static void mlx5e_grp_per_tc_congest_prio_update_stats(struct mlx5e_priv *priv)
+{
+ struct mlx5e_pport_stats *pstats = &priv->stats.pport;
+ struct mlx5_core_dev *mdev = priv->mdev;
+ u32 in[MLX5_ST_SZ_DW(ppcnt_reg)] = {};
+ int sz = MLX5_ST_SZ_BYTES(ppcnt_reg);
+ void *out;
+ int prio;
+
+ if (!MLX5_CAP_GEN(mdev, sbcam_reg))
+ return;
+
+ MLX5_SET(ppcnt_reg, in, pnat, 2);
+ MLX5_SET(ppcnt_reg, in, grp, MLX5_PER_TRAFFIC_CLASS_CONGESTION_GROUP);
+ for (prio = 0; prio < NUM_PPORT_PRIO; prio++) {
+ out = pstats->per_tc_congest_prio_counters[prio];
+ MLX5_SET(ppcnt_reg, in, prio_tc, prio);
+ mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 0);
+ }
+}
+
+static int mlx5e_grp_per_port_buffer_congest_get_num_stats(struct mlx5e_priv *priv)
+{
+ return mlx5e_grp_per_tc_prio_get_num_stats(priv) +
+ mlx5e_grp_per_tc_congest_prio_get_num_stats(priv);
+}
+
+static void mlx5e_grp_per_port_buffer_congest_update_stats(struct mlx5e_priv *priv)
+{
+ mlx5e_grp_per_tc_prio_update_stats(priv);
+ mlx5e_grp_per_tc_congest_prio_update_stats(priv);
+}
+
#define PPORT_PER_PRIO_OFF(c) \
MLX5_BYTE_OFF(ppcnt_reg, \
counter_set.eth_per_prio_grp_data_layout.c##_high)
@@ -1610,7 +1751,13 @@ const struct mlx5e_stats_grp mlx5e_stats_grps[] = {
.get_num_stats = mlx5e_grp_channels_get_num_stats,
.fill_strings = mlx5e_grp_channels_fill_strings,
.fill_stats = mlx5e_grp_channels_fill_stats,
- }
+ },
+ {
+ .get_num_stats = mlx5e_grp_per_port_buffer_congest_get_num_stats,
+ .fill_strings = mlx5e_grp_per_port_buffer_congest_fill_strings,
+ .fill_stats = mlx5e_grp_per_port_buffer_congest_fill_stats,
+ .update_stats = mlx5e_grp_per_port_buffer_congest_update_stats,
+ },
};
const int mlx5e_num_stats_grps = ARRAY_SIZE(mlx5e_stats_grps);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index c281e567711d..79f261bf86ac 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -207,6 +207,8 @@ struct mlx5e_pport_stats {
__be64 phy_counters[MLX5_ST_SZ_QW(ppcnt_reg)];
__be64 phy_statistical_counters[MLX5_ST_SZ_QW(ppcnt_reg)];
__be64 eth_ext_counters[MLX5_ST_SZ_QW(ppcnt_reg)];
+ __be64 per_tc_prio_counters[NUM_PPORT_PRIO][MLX5_ST_SZ_QW(ppcnt_reg)];
+ __be64 per_tc_congest_prio_counters[NUM_PPORT_PRIO][MLX5_ST_SZ_QW(ppcnt_reg)];
};
#define PCIE_PERF_GET(pcie_stats, c) \
--
2.21.0
^ permalink raw reply related
* Re: [PATCH net] net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list
From: Willem de Bruijn @ 2019-09-05 21:51 UTC (permalink / raw)
To: Shmulik Ladkani
Cc: Daniel Borkmann, Eric Dumazet, eyal, netdev, Shmulik Ladkani,
Alexander Duyck
In-Reply-To: <20190905183633.8144-1-shmulik.ladkani@gmail.com>
On Thu, Sep 5, 2019 at 2:36 PM Shmulik Ladkani <shmulik@metanetworks.com> wrote:
>
> Historically, support for frag_list packets entering skb_segment() was
> limited to frag_list members terminating on exact same gso_size
> boundaries. This is verified with a BUG_ON since commit 89319d3801d1
> ("net: Add frag_list support to skb_segment"), quote:
>
> As such we require all frag_list members terminate on exact MSS
> boundaries. This is checked using BUG_ON.
> As there should only be one producer in the kernel of such packets,
> namely GRO, this requirement should not be difficult to maintain.
>
> However, since commit 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper"),
> the "exact MSS boundaries" assumption no longer holds:
> An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but
> leaves the frag_list members as originally merged by GRO with the
> original 'gso_size'. Example of such programs are bpf-based NAT46 or
> NAT64.
>
> This lead to a kernel BUG_ON for flows involving:
> - GRO generating a frag_list skb
> - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room()
> - skb_segment() of the skb
>
> See example BUG_ON reports in [0].
>
> In commit 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb"),
> skb_segment() was modified to support the "gso_size mangling" case of
> a frag_list GRO'ed skb, but *only* for frag_list members having
> head_frag==true (having a page-fragment head).
>
> Alas, GRO packets having frag_list members with a linear kmalloced head
> (head_frag==false) still hit the BUG_ON.
>
> This commit adds support to skb_segment() for a 'head_skb' packet having
> a frag_list whose members are *non* head_frag, with gso_size mangled, by
> disabling SG and thus falling-back to copying the data from the given
> 'head_skb' into the generated segmented skbs - as suggested by Willem de
> Bruijn [1].
>
> Since this approach involves the penalty of skb_copy_and_csum_bits()
> when building the segments, care was taken in order to enable this
> solution only when required:
> - untrusted gso_size, by testing SKB_GSO_DODGY is set
> (SKB_GSO_DODGY is set by any gso_size mangling functions in
> net/core/filter.c)
> - the frag_list is non empty, its item is a non head_frag, *and* the
> headlen of the given 'head_skb' does not match the gso_size.
>
> [0]
> https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/
> https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/
>
> [1]
> https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/
>
> Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> ---
> net/core/skbuff.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index ea8e8d332d85..c4bd1881acff 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3678,6 +3678,24 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> sg = !!(features & NETIF_F_SG);
> csum = !!can_checksum_protocol(features, proto);
>
> + if (mss != GSO_BY_FRAGS &&
> + (skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY)) {
> + /* gso_size is untrusted.
> + *
> + * If head_skb has a frag_list with a linear non head_frag
> + * item, and head_skb's headlen does not fit requested
> + * gso_size, fall back to copying the skbs - by disabling sg.
> + *
> + * We assume checking the first frag suffices, i.e if either of
> + * the frags have non head_frag data, then the first frag is
> + * too.
> + */
> + if (list_skb && skb_headlen(list_skb) && !list_skb->head_frag &&
> + (mss != skb_headlen(head_skb) - doffset)) {
I thought the idea was to check skb_headlen(list_skb), as that is the
cause of the problem. Is skb_headlen(head_skb) a good predictor of
that? I can certainly imagine that it is, just not sure.
Thanks for preparing the patch, and explaining the problem and
solution clearly in the commit message. I'm pretty sure I'll have
forgotten the finer details next time we have to look at this
function again.
^ permalink raw reply
* Re: [PATCH v2 bpf-next 2/3] bpf: implement CAP_BPF
From: Alexei Starovoitov @ 2019-09-05 22:00 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Alexei Starovoitov, nicolas.dichtel@6wind.com, Alexei Starovoitov,
luto@amacapital.net, davem@davemloft.net, peterz@infradead.org,
rostedt@goodmis.org, netdev@vger.kernel.org, bpf@vger.kernel.org,
Kernel Team, linux-api@vger.kernel.org
In-Reply-To: <acc09eaf-5289-9457-3ce1-f27efb6013b8@iogearbox.net>
On Thu, Sep 05, 2019 at 10:37:03AM +0200, Daniel Borkmann wrote:
> On 9/4/19 5:21 PM, Alexei Starovoitov wrote:
> > On 9/4/19 8:16 AM, Daniel Borkmann wrote:
> > > opening/creating BPF maps" error="Unable to create map
> > > /run/cilium/bpffs/tc/globals/cilium_lxc: operation not permitted"
> > > subsys=daemon
> > > 2019-09-04T14:11:47.28178666Z level=fatal msg="Error while creating
> > > daemon" error="Unable to create map
> > > /run/cilium/bpffs/tc/globals/cilium_lxc: operation not permitted"
> > > subsys=daemon
> >
> > Ok. We have to include caps in both cap_sys_admin and cap_bpf then.
> >
> > > And /same/ deployment with reverted patches, hence no CAP_BPF gets it up
> > > and running again:
> > >
> > > # kubectl get pods --all-namespaces -o wide
> >
> > Can you share what this magic commands do underneath?
>
> What do you mean by magic commands? Latter is showing all pods in the cluster:
>
> https://kubernetes.io/docs/reference/kubectl/cheatsheet/#viewing-finding-resources
"magic" in a sense that I have no idea how k8s "services" and "pods" map
to kernel namespaces.
> Checking moby go code, it seems to exec with GetAllCapabilities which returns
> all of the capabilities it is aware of, and afaics, they seem to be using
> the below go library to get the hard-coded list from where obviously CAP_BPF
> is unknown which might also explain the breakage I've been seeing:
>
> https://github.com/syndtr/gocapability/blob/33e07d32887e1e06b7c025f27ce52f62c7990bc0/capability/enum_gen.go
thanks for the link.
That library is much better written than libcap.
capability_linux.go is reading cap_last_cap dynamically and it can understand
proposed CAP_BPF, CAP_TRACING without need o be recompiled (unlike libcap).
So passing new caps to k8s should be trivial. The library won't know
their names, but passing by number looks to be already supported.
I'm still not sure which part of k8s setup clears the caps and
why this v2 set doesn't work, but that doesn't matter any more.
I believe I addressed this compat issue in v3 set.
Could you please give a try just repeating your previous commands?
^ permalink raw reply
* RE: [PATCH net-next, 2/2] hv_netvsc: Sync offloading features to VF NIC
From: Haiyang Zhang @ 2019-09-05 23:07 UTC (permalink / raw)
To: Jakub Kicinski
Cc: sashal@kernel.org, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
olaf@aepfle.de, vkuznets, davem@davemloft.net,
linux-kernel@vger.kernel.org, Mark Bloch
In-Reply-To: <20190830160451.43a61cf9@cakuba.netronome.com>
> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Friday, August 30, 2019 7:05 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stephen
> Hemminger <sthemmin@microsoft.com>; olaf@aepfle.de; vkuznets
> <vkuznets@redhat.com>; davem@davemloft.net; linux-
> kernel@vger.kernel.org; Mark Bloch <markb@mellanox.com>
> Subject: Re: [PATCH net-next, 2/2] hv_netvsc: Sync offloading features to VF
> NIC
>
> On Fri, 30 Aug 2019 03:45:38 +0000, Haiyang Zhang wrote:
> > VF NIC may go down then come up during host servicing events. This
> > causes the VF NIC offloading feature settings to roll back to the
> > defaults. This patch can synchronize features from synthetic NIC to
> > the VF NIC during ndo_set_features (ethtool -K), and
> > netvsc_register_vf when VF comes back after host events.
> >
> > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> > Cc: Mark Bloch <markb@mellanox.com>
>
> If we want to make this change in behaviour we should change net_failover
> at the same time.
After checking the net_failover, I found it's for virtio based SRIOV, and very
different from what we did for Hyper-V based SRIOV.
We let the netvsc driver acts as both the synthetic (PV) driver and the transparent
bonding master for the VF NIC. But net_failover acts as a master device on top
of both virtio PV NIC, and VF NIC. And the net_failover doesn't implemented
operations, like ndo_set_features.
So the code change for our netvsc driver cannot be applied to net_failover driver.
I will re-submit my two patches (fixing the extra tab in the 1st one as you pointed
out). Thanks!
- Haiyang
^ permalink raw reply
* [PATCH net-next,v2, 0/2] Enable sg as tunable, sync offload settings to VF NIC
From: Haiyang Zhang @ 2019-09-05 23:22 UTC (permalink / raw)
To: sashal@kernel.org, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
vkuznets, davem@davemloft.net, linux-kernel@vger.kernel.org
This patch set fixes an issue in SG tuning, and sync
offload settings from synthetic NIC to VF NIC.
Haiyang Zhang (2):
hv_netvsc: hv_netvsc: Allow scatter-gather feature to be tunable
hv_netvsc: Sync offloading features to VF NIC
drivers/net/hyperv/hyperv_net.h | 2 +-
drivers/net/hyperv/netvsc_drv.c | 26 ++++++++++++++++++++++----
drivers/net/hyperv/rndis_filter.c | 1 +
3 files changed, 24 insertions(+), 5 deletions(-)
--
1.8.3.1
^ permalink raw reply
* [PATCH net-next,v2, 1/2] hv_netvsc: Allow scatter-gather feature to be tunable
From: Haiyang Zhang @ 2019-09-05 23:23 UTC (permalink / raw)
To: sashal@kernel.org, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
vkuznets, davem@davemloft.net, linux-kernel@vger.kernel.org
In-Reply-To: <1567725722-33552-1-git-send-email-haiyangz@microsoft.com>
In a previous patch, the NETIF_F_SG was missing after the code changes.
That caused the SG feature to be "fixed". This patch includes it into
hw_features, so it is tunable again.
Fixes: 23312a3be999 ("netvsc: negotiate checksum and segmentation parameters")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/net/hyperv/hyperv_net.h | 2 +-
drivers/net/hyperv/netvsc_drv.c | 4 ++--
drivers/net/hyperv/rndis_filter.c | 1 +
3 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index ecc9af0..670ef68 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -822,7 +822,7 @@ struct nvsp_message {
#define NETVSC_SUPPORTED_HW_FEATURES (NETIF_F_RXCSUM | NETIF_F_IP_CSUM | \
NETIF_F_TSO | NETIF_F_IPV6_CSUM | \
- NETIF_F_TSO6 | NETIF_F_LRO)
+ NETIF_F_TSO6 | NETIF_F_LRO | NETIF_F_SG)
#define VRSS_SEND_TAB_SIZE 16 /* must be power of 2 */
#define VRSS_CHANNEL_MAX 64
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 0a6cd2f..1f1192e 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2313,8 +2313,8 @@ static int netvsc_probe(struct hv_device *dev,
/* hw_features computed in rndis_netdev_set_hwcaps() */
net->features = net->hw_features |
- NETIF_F_HIGHDMA | NETIF_F_SG |
- NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX;
+ NETIF_F_HIGHDMA | NETIF_F_HW_VLAN_CTAG_TX |
+ NETIF_F_HW_VLAN_CTAG_RX;
net->vlan_features = net->features;
netdev_lockdep_set_classes(net);
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index 317dbe9..abaf815 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1207,6 +1207,7 @@ static int rndis_netdev_set_hwcaps(struct rndis_device *rndis_device,
/* Compute tx offload settings based on hw capabilities */
net->hw_features |= NETIF_F_RXCSUM;
+ net->hw_features |= NETIF_F_SG;
if ((hwcaps.csum.ip4_txcsum & NDIS_TXCSUM_ALL_TCP4) == NDIS_TXCSUM_ALL_TCP4) {
/* Can checksum TCP */
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next,v2, 2/2] hv_netvsc: Sync offloading features to VF NIC
From: Haiyang Zhang @ 2019-09-05 23:23 UTC (permalink / raw)
To: sashal@kernel.org, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org
Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
vkuznets, davem@davemloft.net, linux-kernel@vger.kernel.org,
Mark Bloch
In-Reply-To: <1567725722-33552-1-git-send-email-haiyangz@microsoft.com>
VF NIC may go down then come up during host servicing events. This
causes the VF NIC offloading feature settings to roll back to the
defaults. This patch can synchronize features from synthetic NIC to
the VF NIC during ndo_set_features (ethtool -K),
and netvsc_register_vf when VF comes back after host events.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Mark Bloch <markb@mellanox.com>
---
drivers/net/hyperv/netvsc_drv.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 1f1192e..39dddcd 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1785,13 +1785,15 @@ static int netvsc_set_features(struct net_device *ndev,
netdev_features_t change = features ^ ndev->features;
struct net_device_context *ndevctx = netdev_priv(ndev);
struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev);
+ struct net_device *vf_netdev = rtnl_dereference(ndevctx->vf_netdev);
struct ndis_offload_params offloads;
+ int ret = 0;
if (!nvdev || nvdev->destroy)
return -ENODEV;
if (!(change & NETIF_F_LRO))
- return 0;
+ goto syncvf;
memset(&offloads, 0, sizeof(struct ndis_offload_params));
@@ -1803,7 +1805,19 @@ static int netvsc_set_features(struct net_device *ndev,
offloads.rsc_ip_v6 = NDIS_OFFLOAD_PARAMETERS_RSC_DISABLED;
}
- return rndis_filter_set_offload_params(ndev, nvdev, &offloads);
+ ret = rndis_filter_set_offload_params(ndev, nvdev, &offloads);
+
+ if (ret)
+ features ^= NETIF_F_LRO;
+
+syncvf:
+ if (!vf_netdev)
+ return ret;
+
+ vf_netdev->wanted_features = features;
+ netdev_update_features(vf_netdev);
+
+ return ret;
}
static u32 netvsc_get_msglevel(struct net_device *ndev)
@@ -2181,6 +2195,10 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
dev_hold(vf_netdev);
rcu_assign_pointer(net_device_ctx->vf_netdev, vf_netdev);
+
+ vf_netdev->wanted_features = ndev->features;
+ netdev_update_features(vf_netdev);
+
return NOTIFY_OK;
}
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH bpf-next] kbuild: replace BASH-specific ${@:2} with shift and ${@}
From: Yonghong Song @ 2019-09-05 23:59 UTC (permalink / raw)
To: Andrii Nakryiko, bpf@vger.kernel.org, netdev@vger.kernel.org,
Alexei Starovoitov, daniel@iogearbox.net
Cc: andrii.nakryiko@gmail.com, Kernel Team, Stephen Rothwell,
Masahiro Yamada
In-Reply-To: <20190905175938.599455-1-andriin@fb.com>
On 9/5/19 10:59 AM, Andrii Nakryiko wrote:
> ${@:2} is BASH-specific extension, which makes link-vmlinux.sh rely on
> BASH. Use shift and ${@} instead to fix this issue.
>
> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs")
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Tested with bash/sh/csh, all works.
Acked-by: Yonghong Song <yhs@fb.com>
> ---
> scripts/link-vmlinux.sh | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 0d8f41db8cd6..8c59970a09dc 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -57,12 +57,16 @@ modpost_link()
>
> # Link of vmlinux
> # ${1} - output file
> -# ${@:2} - optional extra .o files
> +# ${2}, ${3}, ... - optional extra .o files
> vmlinux_link()
> {
> local lds="${objtree}/${KBUILD_LDS}"
> + local output=${1}
> local objects
>
> + # skip output file argument
> + shift
> +
> if [ "${SRCARCH}" != "um" ]; then
> objects="--whole-archive \
> ${KBUILD_VMLINUX_OBJS} \
> @@ -70,9 +74,10 @@ vmlinux_link()
> --start-group \
> ${KBUILD_VMLINUX_LIBS} \
> --end-group \
> - ${@:2}"
> + ${@}"
>
> - ${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} -o ${1} \
> + ${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} \
> + -o ${output} \
> -T ${lds} ${objects}
> else
> objects="-Wl,--whole-archive \
> @@ -81,9 +86,10 @@ vmlinux_link()
> -Wl,--start-group \
> ${KBUILD_VMLINUX_LIBS} \
> -Wl,--end-group \
> - ${@:2}"
> + ${@}"
>
> - ${CC} ${CFLAGS_vmlinux} -o ${1} \
> + ${CC} ${CFLAGS_vmlinux} \
> + -o ${output} \
> -Wl,-T,${lds} \
> ${objects} \
> -lutil -lrt -lpthread
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox