* Re: [PATCH net-next] net: bridge: switchdev: check br_vlan_group() return value
From: patchwork-bot+netdevbpf @ 2022-04-22 22:20 UTC (permalink / raw)
To: =?utf-8?b?Q2zDqW1lbnQgTMOpZ2VyIDxjbGVtZW50LmxlZ2VyQGJvb3RsaW4uY29tPg==?=
Cc: roopa, razor, davem, kuba, pabeni, tobias, bridge, netdev,
linux-kernel
In-Reply-To: <20220421101247.121896-1-clement.leger@bootlin.com>
Hello:
This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 21 Apr 2022 12:12:47 +0200 you wrote:
> br_vlan_group() can return NULL and thus return value must be checked
> to avoid dereferencing a NULL pointer.
>
> Fixes: 6284c723d9b9 ("net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations")
> Signed-off-by: Clément Léger <clement.leger@bootlin.com>
> ---
> net/bridge/br_switchdev.c | 2 ++
> 1 file changed, 2 insertions(+)
Here is the summary with links:
- [net-next] net: bridge: switchdev: check br_vlan_group() return value
https://git.kernel.org/netdev/net/c/7f40ea2145d9
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net-next] qed: Remove IP services API.
From: patchwork-bot+netdevbpf @ 2022-04-22 22:20 UTC (permalink / raw)
To: Guillaume Nault
Cc: davem, kuba, pabeni, netdev, aelior, manishc, irusskikh, nassa,
pkushwaha, okulkarni, mkalderon, smalin, hare
In-Reply-To: <351ac8c847980e22850eb390553f8cc0e1ccd0ce.1650545051.git.gnault@redhat.com>
Hello:
This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 21 Apr 2022 14:47:26 +0200 you wrote:
> qed_nvmetcp_ip_services.c and its corresponding header file were
> introduced in commit 806ee7f81a2b ("qed: Add IP services APIs support")
> but there's still no users for any of the functions they declare.
> Since these files are effectively unused, let's just drop them.
>
> Found by code inspection. Compile-tested only.
>
> [...]
Here is the summary with links:
- [net-next] qed: Remove IP services API.
https://git.kernel.org/netdev/net-next/c/5e7260712b9a
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH] WireGuard: restrict packet handling to non-isolated CPUs.
From: Charles-François Natali @ 2022-04-22 22:23 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: wireguard, netdev, linux-crypto, Daniel Jordan, Steffen Klassert
In-Reply-To: <YmHwjdfZJJ2DeLTK@zx2c4.com>
Hi,
On Fri, 22 Apr 2022 at 01:02, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> netdev@ - Original thread is at
> https://lore.kernel.org/wireguard/20220405212129.2270-1-cf.natali@gmail.com/
>
> Hi Charles-François,
>
> On Tue, Apr 05, 2022 at 10:21:29PM +0100, Charles-Francois Natali wrote:
> > WireGuard currently uses round-robin to dispatch the handling of
> > packets, handling them on all online CPUs, including isolated ones
> > (isolcpus).
> >
> > This is unfortunate because it causes significant latency on isolated
> > CPUs - see e.g. below over 240 usec:
> >
> > kworker/47:1-2373323 [047] 243644.756405: funcgraph_entry: |
> > process_one_work() { kworker/47:1-2373323 [047] 243644.756406:
> > funcgraph_entry: | wg_packet_decrypt_worker() { [...]
> > kworker/47:1-2373323 [047] 243644.756647: funcgraph_exit: 0.591 us | }
> > kworker/47:1-2373323 [047] 243644.756647: funcgraph_exit: ! 242.655 us
> > | }
> >
> > Instead, restrict to non-isolated CPUs.
>
> Huh, interesting... I haven't seen this feature before. What's the
> intended use case? To never run _anything_ on those cores except
> processes you choose? To run some things but not intensive things? Is it
> sort of a RT-lite?
Yes, the idea is to not run anything on those cores: no user tasks, no unbound
workqueues, etc.
Typically one would also set IRQ affinity etc to avoid those cores, to avoid
(soft)IRQS which cause significant latency as well.
This series by Frederic Weisbecker is a good introduction:
https://www.suse.com/c/cpu-isolation-introduction-part-1/
The idea is to achieve low latency and jitter.
With a reasonably tuned kernel one can reach around 10usec latency - however
whenever we start using wireguard, we can see the bound workqueues used for
round-robin dispatch cause up to 1ms stalls, which is just not
acceptable for us.
Currently our only option is to either patch the wireguard code, or
stop using it,
which would be a shame :).
> I took a look in padata/pcrypt and it doesn't look like they're
> examining the housekeeping mask at all. Grepping for
> housekeeping_cpumask doesn't appear to show many results in things like
> workqueues, but rather in core scheduling stuff. So I'm not quite sure
> what to make of this patch.
Thanks, I didn't know about padata, but after skimming through the code it does
seem that it would suffer from the same issue.
> I suspect the thing to do might be to patch both wireguard and padata,
> and send a patch series to me, the padata people, and
> netdev@vger.kernel.org, and we can all hash this out together.
Sure, I'll try to have a look at the padata code and write something up.
> Regarding your patch, is there a way to make that a bit more succinct,
> without introducing all of those helper functions? It seems awfully
> verbose for something that seems like a matter of replacing the online
> mask with the housekeeping mask.
Indeed, I wasn't really happy about that.
The reason I've written those helper functions is that the housekeeping mask
includes possible CPUs (cpu_possible_mask), so unfortunately it's not just a
matter of e.g. replacing cpu_online_mask with
housekeeping_cpumask(HK_FLAG_DOMAIN), we have to perform an AND
whenever we compute the weight, find the next CPU in the mask etc.
And I'd rather have the operations and mask in a single location instead of
scattered throughout the code, to make it easier to understand and maintain.
Happy to change to something more inline though, or open to suggestions.
Cheers,
Charles
>
> Jason
^ permalink raw reply
* Re: [PATCH net-next v1 10/17] net/mlx5: Clean IPsec FS add/delete rules
From: Saeed Mahameed @ 2022-04-22 22:25 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Paolo Abeni, Jakub Kicinski, David S . Miller, Leon Romanovsky,
Jason Gunthorpe, linux-netdev, Raed Salem
In-Reply-To: <874f16edb960923bb25c83382d96cd4cb3732485.1650363043.git.leonro@nvidia.com>
On 19 Apr 13:13, Leon Romanovsky wrote:
>From: Leon Romanovsky <leonro@nvidia.com>
>
>Reuse existing struct to pass parameters instead of open code them.
>
Why? what do you mean "open code them" ? they are not open coded, they are
primitive for a reason ! If we go with this reasoning, then let's pass
mlx5e_priv to all functions and just forget about modularity.
>Reviewed-by: Raed Salem <raeds@nvidia.com>
>Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>---
> .../mellanox/mlx5/core/en_accel/ipsec.c | 10 +---
> .../mellanox/mlx5/core/en_accel/ipsec.h | 7 +--
> .../mellanox/mlx5/core/en_accel/ipsec_fs.c | 55 ++++++++++---------
> 3 files changed, 34 insertions(+), 38 deletions(-)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
>index 537311a74bfb..81c9831ad286 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
>@@ -313,9 +313,7 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x)
> if (err)
> goto err_xfrm;
>
>- err = mlx5e_accel_ipsec_fs_add_rule(priv, &sa_entry->attrs,
>- sa_entry->ipsec_obj_id,
>- &sa_entry->ipsec_rule);
>+ err = mlx5e_accel_ipsec_fs_add_rule(priv, sa_entry);
To add to my comment on the previous patch, in here the issue is more
severe as previously ipsec_fs.c was unaware of sa_entry object and used to
deal with pure fs related objects, you are peppering the code with sa_entry for
no reason, other than reducing function parameters from 4 to 2.
> if (err)
> goto err_hw_ctx;
>
>@@ -333,8 +331,7 @@ static int mlx5e_xfrm_add_state(struct xfrm_state *x)
> goto out;
>
> err_add_rule:
>- mlx5e_accel_ipsec_fs_del_rule(priv, &sa_entry->attrs,
>- &sa_entry->ipsec_rule);
>+ mlx5e_accel_ipsec_fs_del_rule(priv, sa_entry);
> err_hw_ctx:
> mlx5_ipsec_free_sa_ctx(sa_entry);
> err_xfrm:
>@@ -357,8 +354,7 @@ static void mlx5e_xfrm_free_state(struct xfrm_state *x)
> struct mlx5e_priv *priv = netdev_priv(x->xso.dev);
>
> cancel_work_sync(&sa_entry->modify_work.work);
>- mlx5e_accel_ipsec_fs_del_rule(priv, &sa_entry->attrs,
>- &sa_entry->ipsec_rule);
>+ mlx5e_accel_ipsec_fs_del_rule(priv, sa_entry);
> mlx5_ipsec_free_sa_ctx(sa_entry);
> kfree(sa_entry);
> }
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
>index cdcb95f90623..af1467cbb7c7 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
>@@ -176,12 +176,9 @@ struct xfrm_state *mlx5e_ipsec_sadb_rx_lookup(struct mlx5e_ipsec *dev,
> void mlx5e_accel_ipsec_fs_cleanup(struct mlx5e_ipsec *ipsec);
> int mlx5e_accel_ipsec_fs_init(struct mlx5e_ipsec *ipsec);
> int mlx5e_accel_ipsec_fs_add_rule(struct mlx5e_priv *priv,
>- struct mlx5_accel_esp_xfrm_attrs *attrs,
>- u32 ipsec_obj_id,
>- struct mlx5e_ipsec_rule *ipsec_rule);
>+ struct mlx5e_ipsec_sa_entry *sa_entry);
> void mlx5e_accel_ipsec_fs_del_rule(struct mlx5e_priv *priv,
>- struct mlx5_accel_esp_xfrm_attrs *attrs,
>- struct mlx5e_ipsec_rule *ipsec_rule);
>+ struct mlx5e_ipsec_sa_entry *sa_entry);
>
> int mlx5_ipsec_create_sa_ctx(struct mlx5e_ipsec_sa_entry *sa_entry);
> void mlx5_ipsec_free_sa_ctx(struct mlx5e_ipsec_sa_entry *sa_entry);
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
>index 96ab2e9d6f9a..342828351254 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
>@@ -454,11 +454,12 @@ static void setup_fte_common(struct mlx5_accel_esp_xfrm_attrs *attrs,
> }
>
> static int rx_add_rule(struct mlx5e_priv *priv,
>- struct mlx5_accel_esp_xfrm_attrs *attrs,
>- u32 ipsec_obj_id,
>- struct mlx5e_ipsec_rule *ipsec_rule)
>+ struct mlx5e_ipsec_sa_entry *sa_entry)
> {
> u8 action[MLX5_UN_SZ_BYTES(set_add_copy_action_in_auto)] = {};
>+ struct mlx5e_ipsec_rule *ipsec_rule = &sa_entry->ipsec_rule;
>+ struct mlx5_accel_esp_xfrm_attrs *attrs = &sa_entry->attrs;
>+ u32 ipsec_obj_id = sa_entry->ipsec_obj_id;
> struct mlx5_modify_hdr *modify_hdr = NULL;
> struct mlx5e_accel_fs_esp_prot *fs_prot;
> struct mlx5_flow_destination dest = {};
>@@ -532,9 +533,7 @@ static int rx_add_rule(struct mlx5e_priv *priv,
> }
>
> static int tx_add_rule(struct mlx5e_priv *priv,
>- struct mlx5_accel_esp_xfrm_attrs *attrs,
>- u32 ipsec_obj_id,
>- struct mlx5e_ipsec_rule *ipsec_rule)
>+ struct mlx5e_ipsec_sa_entry *sa_entry)
> {
> struct mlx5_flow_act flow_act = {};
> struct mlx5_flow_handle *rule;
>@@ -551,7 +550,8 @@ static int tx_add_rule(struct mlx5e_priv *priv,
> goto out;
> }
>
>- setup_fte_common(attrs, ipsec_obj_id, spec, &flow_act);
>+ setup_fte_common(&sa_entry->attrs, sa_entry->ipsec_obj_id, spec,
>+ &flow_act);
>
> /* Add IPsec indicator in metadata_reg_a */
> spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS_2;
>@@ -566,11 +566,11 @@ static int tx_add_rule(struct mlx5e_priv *priv,
> if (IS_ERR(rule)) {
> err = PTR_ERR(rule);
> netdev_err(priv->netdev, "fail to add ipsec rule attrs->action=0x%x, err=%d\n",
>- attrs->action, err);
>+ sa_entry->attrs.action, err);
> goto out;
> }
>
>- ipsec_rule->rule = rule;
>+ sa_entry->ipsec_rule.rule = rule;
>
> out:
> kvfree(spec);
>@@ -580,21 +580,25 @@ static int tx_add_rule(struct mlx5e_priv *priv,
> }
>
> static void rx_del_rule(struct mlx5e_priv *priv,
>- struct mlx5_accel_esp_xfrm_attrs *attrs,
>- struct mlx5e_ipsec_rule *ipsec_rule)
>+ struct mlx5e_ipsec_sa_entry *sa_entry)
> {
>+ struct mlx5e_ipsec_rule *ipsec_rule = &sa_entry->ipsec_rule;
>+
> mlx5_del_flow_rules(ipsec_rule->rule);
> ipsec_rule->rule = NULL;
>
> mlx5_modify_header_dealloc(priv->mdev, ipsec_rule->set_modify_hdr);
> ipsec_rule->set_modify_hdr = NULL;
>
>- rx_ft_put(priv, attrs->is_ipv6 ? ACCEL_FS_ESP6 : ACCEL_FS_ESP4);
>+ rx_ft_put(priv,
>+ sa_entry->attrs.is_ipv6 ? ACCEL_FS_ESP6 : ACCEL_FS_ESP4);
> }
>
> static void tx_del_rule(struct mlx5e_priv *priv,
>- struct mlx5e_ipsec_rule *ipsec_rule)
>+ struct mlx5e_ipsec_sa_entry *sa_entry)
> {
>+ struct mlx5e_ipsec_rule *ipsec_rule = &sa_entry->ipsec_rule;
>+
> mlx5_del_flow_rules(ipsec_rule->rule);
> ipsec_rule->rule = NULL;
>
>@@ -602,24 +606,23 @@ static void tx_del_rule(struct mlx5e_priv *priv,
> }
>
> int mlx5e_accel_ipsec_fs_add_rule(struct mlx5e_priv *priv,
>- struct mlx5_accel_esp_xfrm_attrs *attrs,
>- u32 ipsec_obj_id,
>- struct mlx5e_ipsec_rule *ipsec_rule)
>+ struct mlx5e_ipsec_sa_entry *sa_entry)
> {
>- if (attrs->action == MLX5_ACCEL_ESP_ACTION_DECRYPT)
>- return rx_add_rule(priv, attrs, ipsec_obj_id, ipsec_rule);
>- else
>- return tx_add_rule(priv, attrs, ipsec_obj_id, ipsec_rule);
>+ if (sa_entry->attrs.action == MLX5_ACCEL_ESP_ACTION_ENCRYPT)
>+ return tx_add_rule(priv, sa_entry);
>+
>+ return rx_add_rule(priv, sa_entry);
> }
>
> void mlx5e_accel_ipsec_fs_del_rule(struct mlx5e_priv *priv,
>- struct mlx5_accel_esp_xfrm_attrs *attrs,
>- struct mlx5e_ipsec_rule *ipsec_rule)
>+ struct mlx5e_ipsec_sa_entry *sa_entry)
> {
>- if (attrs->action == MLX5_ACCEL_ESP_ACTION_DECRYPT)
>- rx_del_rule(priv, attrs, ipsec_rule);
>- else
>- tx_del_rule(priv, ipsec_rule);
>+ if (sa_entry->attrs.action == MLX5_ACCEL_ESP_ACTION_ENCRYPT) {
>+ tx_del_rule(priv, sa_entry);
>+ return;
>+ }
>+
>+ rx_del_rule(priv, sa_entry);
> }
>
> void mlx5e_accel_ipsec_fs_cleanup(struct mlx5e_ipsec *ipsec)
>--
>2.35.1
>
^ permalink raw reply
* Re: [PATCH net v3] tcp: ensure to use the most recently sent skb when filling the rate sample
From: patchwork-bot+netdevbpf @ 2022-04-22 22:40 UTC (permalink / raw)
To: Pengcheng Yang
Cc: edumazet, ncardwell, netdev, davem, yoshfuji, dsahern, kuba,
pabeni
In-Reply-To: <1650422081-22153-1-git-send-email-yangpc@wangsu.com>
Hello:
This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 20 Apr 2022 10:34:41 +0800 you wrote:
> If an ACK (s)acks multiple skbs, we favor the information
> from the most recently sent skb by choosing the skb with
> the highest prior_delivered count. But in the interval
> between receiving ACKs, we send multiple skbs with the same
> prior_delivered, because the tp->delivered only changes
> when we receive an ACK.
>
> [...]
Here is the summary with links:
- [net,v3] tcp: ensure to use the most recently sent skb when filling the rate sample
https://git.kernel.org/netdev/net/c/b253a0680cea
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net-next v1 13/17] net/mlx5: Simplify IPsec capabilities logic
From: Saeed Mahameed @ 2022-04-22 22:42 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Paolo Abeni, Jakub Kicinski, David S . Miller, Leon Romanovsky,
Jason Gunthorpe, linux-netdev, Raed Salem
In-Reply-To: <f47d197be948ce44772baf3276a1a855ad2f210a.1650363043.git.leonro@nvidia.com>
On 19 Apr 13:13, Leon Romanovsky wrote:
>From: Leon Romanovsky <leonro@nvidia.com>
>
>Reduce number of hard-coded IPsec capabilities by making sure
>that mlx5_ipsec_device_caps() sets only supported bits.
>
>As part of this change, remove _accel_ notations from the names
>and prepare the code to IPsec full offload mode.
>
Can you explain why remove __accel__ notation ?
__accel__ notation and decoupling from other common netdev features is done
for modularity purpose, en_accel directories are separated so we can
implement complex/stateful accelerations while avoid contaminating/affecting
common data-path performance sensitives flows.
I think keeping __accel__ notations is a must here for the above reasons,
unless you have a more strong reason to remove it..
>Reviewed-by: Raed Salem <raeds@nvidia.com>
>Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>---
> .../mellanox/mlx5/core/en_accel/ipsec.c | 16 ++------------
> .../mellanox/mlx5/core/en_accel/ipsec.h | 9 +++-----
> .../mlx5/core/en_accel/ipsec_offload.c | 22 +++++++++----------
> 3 files changed, 16 insertions(+), 31 deletions(-)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
>index Clean IPsec FS add/delete rules28729b1cc6e6..be7650d2cfd3 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
>@@ -215,7 +215,7 @@ static inline int mlx5e_xfrm_validate_state(struct xfrm_state *x)
> return -EINVAL;
> }
> if (x->props.flags & XFRM_STATE_ESN &&
>- !(mlx5_ipsec_device_caps(priv->mdev) & MLX5_ACCEL_IPSEC_CAP_ESN)) {
>+ !(mlx5_ipsec_device_caps(priv->mdev) & MLX5_IPSEC_CAP_ESN)) {
> netdev_info(netdev, "Cannot offload ESN xfrm states\n");
> return -EINVAL;
> }
>@@ -262,11 +262,6 @@ static inline int mlx5e_xfrm_validate_state(struct xfrm_state *x)
> netdev_info(netdev, "Cannot offload xfrm states with geniv other than seqiv\n");
> return -EINVAL;
> }
>- if (x->props.family == AF_INET6 &&
>- !(mlx5_ipsec_device_caps(priv->mdev) & MLX5_ACCEL_IPSEC_CAP_IPV6)) {
>- netdev_info(netdev, "IPv6 xfrm state offload is not supported by this device\n");
>- return -EINVAL;
>- }
> return 0;
> }
>
>@@ -457,12 +452,6 @@ void mlx5e_ipsec_build_netdev(struct mlx5e_priv *priv)
> if (!mlx5_ipsec_device_caps(mdev))
> return;
>
>- if (!(mlx5_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_CAP_ESP) ||
>- !MLX5_CAP_ETH(mdev, swp)) {
>- mlx5_core_dbg(mdev, "mlx5e: ESP and SWP offload not supported\n");
>- return;
>- }
>-
> mlx5_core_info(mdev, "mlx5e: IPSec ESP acceleration enabled\n");
> netdev->xfrmdev_ops = &mlx5e_ipsec_xfrmdev_ops;
> netdev->features |= NETIF_F_HW_ESP;
>@@ -476,8 +465,7 @@ void mlx5e_ipsec_build_netdev(struct mlx5e_priv *priv)
> netdev->features |= NETIF_F_HW_ESP_TX_CSUM;
> netdev->hw_enc_features |= NETIF_F_HW_ESP_TX_CSUM;
>
>- if (!(mlx5_ipsec_device_caps(mdev) & MLX5_ACCEL_IPSEC_CAP_LSO) ||
>- !MLX5_CAP_ETH(mdev, swp_lso)) {
>+ if (!MLX5_CAP_ETH(mdev, swp_lso)) {
> mlx5_core_dbg(mdev, "mlx5e: ESP LSO not supported\n");
> return;
> }
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
>index af1467cbb7c7..97c55620089d 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
>@@ -102,12 +102,9 @@ struct mlx5_accel_esp_xfrm_attrs {
> u8 is_ipv6;
> };
>
>-enum mlx5_accel_ipsec_cap {
>- MLX5_ACCEL_IPSEC_CAP_DEVICE = 1 << 0,
>- MLX5_ACCEL_IPSEC_CAP_ESP = 1 << 1,
>- MLX5_ACCEL_IPSEC_CAP_IPV6 = 1 << 2,
>- MLX5_ACCEL_IPSEC_CAP_LSO = 1 << 3,
>- MLX5_ACCEL_IPSEC_CAP_ESN = 1 << 4,
>+enum mlx5_ipsec_cap {
>+ MLX5_IPSEC_CAP_CRYPTO = 1 << 0,
>+ MLX5_IPSEC_CAP_ESN = 1 << 1,
> };
>
> struct mlx5e_priv;
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
>index 817747d5229e..b44bce3f4ef1 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
>@@ -7,7 +7,7 @@
>
> u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev)
> {
>- u32 caps;
>+ u32 caps = 0;
>
> if (!MLX5_CAP_GEN(mdev, ipsec_offload))
> return 0;
>@@ -19,23 +19,23 @@ u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev)
> MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_IPSEC))
> return 0;
>
>- if (!MLX5_CAP_IPSEC(mdev, ipsec_crypto_offload) ||
>- !MLX5_CAP_ETH(mdev, insert_trailer))
>- return 0;
>-
> if (!MLX5_CAP_FLOWTABLE_NIC_TX(mdev, ipsec_encrypt) ||
> !MLX5_CAP_FLOWTABLE_NIC_RX(mdev, ipsec_decrypt))
> return 0;
>
>- caps = MLX5_ACCEL_IPSEC_CAP_DEVICE | MLX5_ACCEL_IPSEC_CAP_IPV6 |
>- MLX5_ACCEL_IPSEC_CAP_LSO;
>+ if (!MLX5_CAP_IPSEC(mdev, ipsec_crypto_esp_aes_gcm_128_encrypt) ||
>+ !MLX5_CAP_IPSEC(mdev, ipsec_crypto_esp_aes_gcm_128_decrypt))
>+ return 0;
>
>- if (MLX5_CAP_IPSEC(mdev, ipsec_crypto_esp_aes_gcm_128_encrypt) &&
>- MLX5_CAP_IPSEC(mdev, ipsec_crypto_esp_aes_gcm_128_decrypt))
>- caps |= MLX5_ACCEL_IPSEC_CAP_ESP;
>+ if (MLX5_CAP_IPSEC(mdev, ipsec_crypto_offload) &&
>+ MLX5_CAP_ETH(mdev, insert_trailer) && MLX5_CAP_ETH(mdev, swp))
>+ caps |= MLX5_IPSEC_CAP_CRYPTO;
>+
>+ if (!caps)
>+ return 0;
>
> if (MLX5_CAP_IPSEC(mdev, ipsec_esn))
>- caps |= MLX5_ACCEL_IPSEC_CAP_ESN;
>+ caps |= MLX5_IPSEC_CAP_ESN;
>
> /* We can accommodate up to 2^24 different IPsec objects
> * because we use up to 24 bit in flow table metadata
>--
>2.35.1
>
^ permalink raw reply
* Re: [PATCH v2 bpf-next 2/2] selftests/bpf: handle batch operations for map-in-map bpf-maps
From: Daniel Borkmann @ 2022-04-22 22:43 UTC (permalink / raw)
To: Takshak Chahande, netdev, bpf
Cc: andrii, ast, kernel-team, ndixit, kafai, andriin
In-Reply-To: <20220422005044.4099919-2-ctakshak@fb.com>
On 4/22/22 2:50 AM, Takshak Chahande wrote:
[...]
> +static void fetch_and_validate(int outer_map_fd,
> + __u32 *inner_map_fds,
> + struct bpf_map_batch_opts *opts,
> + __u32 batch_size, bool delete_entries)
> +{
> + __u32 *fetched_keys, *fetched_values, fetched_entries = 0;
> + __u32 next_batch_key = 0, step_size = 5;
> + int err, retries = 0, max_retries = 3;
> + __u32 value_size = sizeof(__u32);
> +
> + fetched_keys = calloc(batch_size, value_size);
> + fetched_values = calloc(batch_size, value_size);
> +
> + while (fetched_entries < batch_size) {
> + err = delete_entries
> + ? bpf_map_lookup_and_delete_batch(outer_map_fd,
> + fetched_entries ? &next_batch_key : NULL,
> + &next_batch_key,
> + fetched_keys + fetched_entries,
> + fetched_values + fetched_entries,
> + &step_size, opts)
> + : bpf_map_lookup_batch(outer_map_fd,
> + fetched_entries ? &next_batch_key : NULL,
> + &next_batch_key,
> + fetched_keys + fetched_entries,
> + fetched_values + fetched_entries,
> + &step_size, opts);
> + CHECK((err < 0 && (errno != ENOENT && errno != ENOSPC)),
> + "lookup with steps failed",
> + "error: %s\n", strerror(errno));
> +
> + fetched_entries += step_size;
> + /* retry for max_retries if ENOSPC */
> + if (errno == ENOSPC)
> + ++retries;
> +
> + if (retries >= max_retries)
> + break;
> + }
> +
> + CHECK((fetched_entries != batch_size && err != ENOSPC),
> + "Unable to fetch expected entries !",
> + "fetched_entries(%d) and batch_size(%d) error: (%d):%s\n",
> + fetched_entries, batch_size, errno, strerror(errno));
> +
Looks like BPF CI in test_maps trips right here:
[...]
test_lpm_trie_map_batch_ops:PASS
batch_op is successful for batch_size(5)
batch_op is successful for batch_size(10)
test_map_in_map_batch_ops_array:PASS with inner ARRAY map
batch_op is successful for batch_size(5)
batch_op is successful for batch_size(10)
test_map_in_map_batch_ops_array:PASS with inner HASH map
fetch_and_validate(158):FAIL:Unable to fetch expected entries ! fetched_entries(8) and batch_size(5) error: (2):No such file or directory
test_verifier - Testing test_verifier
collect_status - Collect status
shutdown - Shutdown
Test Results:
bpftool: PASS
test_progs: PASS
test_progs-no_alu32: PASS
test_maps: FAIL (returned 255)
test_verifier: PASS
shutdown: CLEAN
Error: Process completed with exit code 1.
^ permalink raw reply
* Re: [PATCH net-next v1 15/17] net/mlx5: Cleanup XFRM attributes struct
From: Saeed Mahameed @ 2022-04-22 22:45 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Paolo Abeni, Jakub Kicinski, David S . Miller, Leon Romanovsky,
Jason Gunthorpe, linux-netdev, Raed Salem
In-Reply-To: <5910e1bca2a5d34b8669b8ddc6c62943435e566f.1650363043.git.leonro@nvidia.com>
On 19 Apr 13:13, Leon Romanovsky wrote:
>From: Leon Romanovsky <leonro@nvidia.com>
>
>Remove everything that is not used or from mlx5_accel_esp_xfrm_attrs,
>together with change type of spi to store proper type from the beginning.
>
>Reviewed-by: Raed Salem <raeds@nvidia.com>
>Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>---
> .../mellanox/mlx5/core/en_accel/ipsec.c | 10 ++-------
> .../mellanox/mlx5/core/en_accel/ipsec.h | 21 ++-----------------
> .../mellanox/mlx5/core/en_accel/ipsec_fs.c | 4 ++--
> .../mlx5/core/en_accel/ipsec_offload.c | 4 ++--
> 4 files changed, 8 insertions(+), 31 deletions(-)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
>index be7650d2cfd3..35e2bb301c26 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
>@@ -137,7 +137,7 @@ mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry,
> struct mlx5_accel_esp_xfrm_attrs *attrs)
> {
> struct xfrm_state *x = sa_entry->x;
>- struct aes_gcm_keymat *aes_gcm = &attrs->keymat.aes_gcm;
>+ struct aes_gcm_keymat *aes_gcm = &attrs->aes_gcm;
> struct aead_geniv_ctx *geniv_ctx;
> struct crypto_aead *aead;
> unsigned int crypto_data_len, key_len;
>@@ -171,12 +171,6 @@ mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry,
> attrs->flags |= MLX5_ACCEL_ESP_FLAGS_ESN_STATE_OVERLAP;
> }
>
>- /* rx handle */
>- attrs->sa_handle = sa_entry->handle;
>-
>- /* algo type */
>- attrs->keymat_type = MLX5_ACCEL_ESP_KEYMAT_AES_GCM;
>-
> /* action */
> attrs->action = (!(x->xso.flags & XFRM_OFFLOAD_INBOUND)) ?
> MLX5_ACCEL_ESP_ACTION_ENCRYPT :
>@@ -187,7 +181,7 @@ mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry,
> MLX5_ACCEL_ESP_FLAGS_TUNNEL;
>
> /* spi */
>- attrs->spi = x->id.spi;
>+ attrs->spi = be32_to_cpu(x->id.spi);
>
> /* source , destination ips */
> memcpy(&attrs->saddr, x->props.saddr.a6, sizeof(attrs->saddr));
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
>index 97c55620089d..16bcceec16c4 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
>@@ -55,11 +55,6 @@ enum mlx5_accel_esp_action {
> MLX5_ACCEL_ESP_ACTION_ENCRYPT,
> };
>
>-enum mlx5_accel_esp_keymats {
>- MLX5_ACCEL_ESP_KEYMAT_AES_NONE,
>- MLX5_ACCEL_ESP_KEYMAT_AES_GCM,
>-};
>-
> struct aes_gcm_keymat {
> u64 seq_iv;
>
>@@ -73,21 +68,9 @@ struct aes_gcm_keymat {
> struct mlx5_accel_esp_xfrm_attrs {
> enum mlx5_accel_esp_action action;
> u32 esn;
>- __be32 spi;
>- u32 seq;
>- u32 tfc_pad;
>+ u32 spi;
> u32 flags;
>- u32 sa_handle;
>- union {
>- struct {
>- u32 size;
>-
>- } bmp;
>- } replay;
>- enum mlx5_accel_esp_keymats keymat_type;
>- union {
>- struct aes_gcm_keymat aes_gcm;
>- } keymat;
Why do we have so many unused fields ? are these leftovers from FPGA ipsec ?
>+ struct aes_gcm_keymat aes_gcm;
>
> union {
> __be32 a4;
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
>index 9d95a0025fd6..8315e8f603d7 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
>@@ -356,8 +356,8 @@ static void setup_fte_common(struct mlx5_accel_esp_xfrm_attrs *attrs,
>
> /* SPI number */
> MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, misc_parameters.outer_esp_spi);
>- MLX5_SET(fte_match_param, spec->match_value, misc_parameters.outer_esp_spi,
>- be32_to_cpu(attrs->spi));
>+ MLX5_SET(fte_match_param, spec->match_value,
>+ misc_parameters.outer_esp_spi, attrs->spi);
>
> if (ip_version == 4) {
> memcpy(MLX5_ADDR_OF(fte_match_param, spec->match_value,
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
>index 91ec8b8bf1ec..b13e152fe9fc 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
>@@ -50,7 +50,7 @@ static int mlx5_create_ipsec_obj(struct mlx5e_ipsec_sa_entry *sa_entry)
> {
> struct mlx5_accel_esp_xfrm_attrs *attrs = &sa_entry->attrs;
> struct mlx5_core_dev *mdev = mlx5e_ipsec_sa2dev(sa_entry);
>- struct aes_gcm_keymat *aes_gcm = &attrs->keymat.aes_gcm;
>+ struct aes_gcm_keymat *aes_gcm = &attrs->aes_gcm;
> u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)];
> u32 in[MLX5_ST_SZ_DW(create_ipsec_obj_in)] = {};
> void *obj, *salt_p, *salt_iv_p;
>@@ -106,7 +106,7 @@ static void mlx5_destroy_ipsec_obj(struct mlx5e_ipsec_sa_entry *sa_entry)
>
> int mlx5_ipsec_create_sa_ctx(struct mlx5e_ipsec_sa_entry *sa_entry)
> {
>- struct aes_gcm_keymat *aes_gcm = &sa_entry->attrs.keymat.aes_gcm;
>+ struct aes_gcm_keymat *aes_gcm = &sa_entry->attrs.aes_gcm;
> struct mlx5_core_dev *mdev = mlx5e_ipsec_sa2dev(sa_entry);
> int err;
>
>--
>2.35.1
>
^ permalink raw reply
* Re: [PATCH net-next] mlxsw: core_linecards: Fix size of array element during ini_files allocation
From: patchwork-bot+netdevbpf @ 2022-04-22 22:50 UTC (permalink / raw)
To: Ido Schimmel; +Cc: netdev, davem, kuba, pabeni, petrm, jiri, mlxsw
In-Reply-To: <20220420142007.3041173-1-idosch@nvidia.com>
Hello:
This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 20 Apr 2022 17:20:07 +0300 you wrote:
> From: Jiri Pirko <jiri@nvidia.com>
>
> types_info->ini_files is an array of pointers
> to struct mlxsw_linecard_ini_file.
>
> Fix the kmalloc_array() argument to be of a size of a pointer.
>
> [...]
Here is the summary with links:
- [net-next] mlxsw: core_linecards: Fix size of array element during ini_files allocation
https://git.kernel.org/netdev/net-next/c/869376d0859a
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net 1/2] iavf: Fix error when changing ring parameters on ice PF
From: Jakub Kicinski @ 2022-04-22 22:47 UTC (permalink / raw)
To: Tony Nguyen
Cc: davem, pabeni, Michal Maloszewski, netdev, sassmann,
Sylwester Dziedziuch, Konrad Jankowski
In-Reply-To: <20220420172624.931237-2-anthony.l.nguyen@intel.com>
On Wed, 20 Apr 2022 10:26:23 -0700 Tony Nguyen wrote:
> From: Michal Maloszewski <michal.maloszewski@intel.com>
>
> Reset is triggered when ring parameters are being changed through
> ethtool and queues are reconfigured for VF's VSI. If ring is changed
> again immediately, then the next reset could be executed before
> queues could be properly reinitialized on VF's VSI. It caused ice PF
> to mess up the VSI resource tree.
>
> Add a check in iavf_set_ringparam for adapter and VF's queue
> state. If VF is currently resetting or queues are disabled for the VF
> return with EAGAIN error.
Can't we wait for the device to get into the right state?
Throwing EAGAIN back to user space is not very friendly.
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> index 3bb56714beb0..08efbc50fbe9 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> @@ -631,6 +631,11 @@ static int iavf_set_ringparam(struct net_device *netdev,
> if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
> return -EINVAL;
>
> + if (adapter->state == __IAVF_RESETTING ||
> + (adapter->state == __IAVF_RUNNING &&
> + (adapter->flags & IAVF_FLAG_QUEUES_DISABLED)))
> + return -EAGAIN;
nit: why add this check in the middle of input validation
(i.e. checking the ring params are supported)?
> if (ring->tx_pending > IAVF_MAX_TXD ||
> ring->tx_pending < IAVF_MIN_TXD ||
> ring->rx_pending > IAVF_MAX_RXD ||
^ permalink raw reply
* Re: [PATCH net-next v1 16/17] net/mlx5: Allow future addition of IPsec object modifiers
From: Saeed Mahameed @ 2022-04-22 22:46 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Paolo Abeni, Jakub Kicinski, David S . Miller, Leon Romanovsky,
Jason Gunthorpe, linux-netdev, Raed Salem
In-Reply-To: <42e816e8fbd9cc0fff772c726c546acf92fc60f9.1650363043.git.leonro@nvidia.com>
On 19 Apr 13:13, Leon Romanovsky wrote:
>From: Leon Romanovsky <leonro@nvidia.com>
>
>Currently, all released FW versions support only two IPsec object
>modifiers, and modify_field_select get and set same value with
>proper bits.
>
>However, it is not future compatible, as new FW can have more
>modifiers and "default" will cause to overwrite not-changed fields.
>
>Fix it by setting explicitly fields that need to be overwritten.
>
>Fixes: 7ed92f97a1ad ("net/mlx5e: IPsec: Add Connect-X IPsec ESN update offload support")
Will apply this to net-mlx5 and send this to net.
>Signed-off-by: Huy Nguyen <huyn@nvidia.com>
>Reviewed-by: Raed Salem <raeds@nvidia.com>
>Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>---
> .../net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
>index b13e152fe9fc..792724ce7336 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_offload.c
>@@ -179,6 +179,9 @@ static int mlx5_modify_ipsec_obj(struct mlx5e_ipsec_sa_entry *sa_entry,
> return -EOPNOTSUPP;
>
> obj = MLX5_ADDR_OF(modify_ipsec_obj_in, in, ipsec_object);
>+ MLX5_SET64(ipsec_obj, obj, modify_field_select,
>+ MLX5_MODIFY_IPSEC_BITMASK_ESN_OVERLAP |
>+ MLX5_MODIFY_IPSEC_BITMASK_ESN_MSB);
> MLX5_SET(ipsec_obj, obj, esn_msb, attrs->esn);
> if (attrs->flags & MLX5_ACCEL_ESP_FLAGS_ESN_STATE_OVERLAP)
> MLX5_SET(ipsec_obj, obj, esn_overlap, 1);
>--
>2.35.1
>
^ permalink raw reply
* Re: [PATCH 1/1] ixgbe: correct SDP0 check of SFP cage for X550
From: Jakub Kicinski @ 2022-04-22 22:52 UTC (permalink / raw)
To: Jeff Daly
Cc: intel-wired-lan, Stephen Douthit, Jesse Brandeburg, Tony Nguyen,
David S. Miller, Paolo Abeni, Don Skidmore, Jeff Kirsher,
moderated list:INTEL ETHERNET DRIVERS,
open list:NETWORKING DRIVERS, open list
In-Reply-To: <20220420205130.23616-1-jeffd@silicom-usa.com>
On Wed, 20 Apr 2022 16:51:30 -0400 Jeff Daly wrote:
> SDP0 for X550 NICs is active low to indicate the presence of an SFP in the
> cage (MOD_ABS#). Invert the results of the logical AND to set
> sfp_cage_full variable correctly.
>
> Fixes: aac9e053f104 ("ixgbe: cleanup crosstalk fix")
>
No new lines between tags, please.
> Suggested-by: Stephen Douthit <stephend@silicom-usa.com>
> Signed-off-by: Jeff Daly <jeffd@silicom-usa.com>
> ---
> drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
> index 4c26c4b92f07..26d16bc85c59 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
> @@ -3308,8 +3308,8 @@ s32 ixgbe_check_mac_link_generic(struct ixgbe_hw *hw, ixgbe_link_speed *speed,
> break;
> case ixgbe_mac_X550EM_x:
> case ixgbe_mac_x550em_a:
> - sfp_cage_full = IXGBE_READ_REG(hw, IXGBE_ESDP) &
> - IXGBE_ESDP_SDP0;
> + sfp_cage_full = !(IXGBE_READ_REG(hw, IXGBE_ESDP) &
> + IXGBE_ESDP_SDP0);
nit: you need to adjust the continuation line so that it starts after
the column in which ( is, above. Alternatively you can use ~ on the
result of the register read to avoid the brackets.
^ permalink raw reply
* Re: [net-next v1] net: Add a second bind table hashed by port and address
From: Joanne Koong @ 2022-04-22 22:55 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Martin KaFai Lau, David Miller, Jakub Kicinski
In-Reply-To: <CANn89iKOkHHJ-papcMXJvq_8xSE2zXvqTfNSfGhq=Y1y_oKy6A@mail.gmail.com>
On Fri, Apr 22, 2022 at 2:25 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Apr 22, 2022 at 2:07 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Thu, Apr 21, 2022 at 3:50 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Thu, Apr 21, 2022 at 3:16 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > We currently have one tcp bind table (bhash) which hashes by port
> > > > number only. In the socket bind path, we check for bind conflicts by
> > > > traversing the specified port's inet_bind2_bucket while holding the
> > > > bucket's spinlock (see inet_csk_get_port() and inet_csk_bind_conflict()).
> > > >
> > > > In instances where there are tons of sockets hashed to the same port
> > > > at different addresses, checking for a bind conflict is time-intensive
> > > > and can cause softirq cpu lockups, as well as stops new tcp connections
> > > > since __inet_inherit_port() also contests for the spinlock.
> > > >
> > > > This patch proposes adding a second bind table, bhash2, that hashes by
> > > > port and ip address. Searching the bhash2 table leads to significantly
> > > > faster conflict resolution and less time holding the spinlock.
> > > > When experimentally testing this on a local server, the results for how
> > > > long a bind request takes were as follows:
> > > >
> > > > when there are ~24k sockets already bound to the port -
> > > >
> > > > ipv4:
> > > > before - 0.002317 seconds
> > > > with bhash2 - 0.000018 seconds
> > > >
> > > > ipv6:
> > > > before - 0.002431 seconds
> > > > with bhash2 - 0.000021 seconds
> > >
> > >
> > > Hi Joanne
> > >
> > > Do you have a test for this ? Are you using 24k IPv6 addresses on the host ?
> > >
> > > I fear we add some extra code and cost for quite an unusual configuration.
> > >
> > > Thanks.
> > >
> > Hi Eric,
> >
> > I have a test on my local server that populates the bhash table entry
> > with 24k sockets for a given port and address, and then times how long
> > a bind request on that port takes.
>
> OK, but why 24k ? Why not 24 M then ?
>
> In this case, will a 64K hash table be big enough ?
24k was one test case scenario, another one was ~12M; these were used
to get a sense of how the bhash2 table performs in situations where
the bhash table entry for the port is saturated.
>
> When populating the table entry, I
> > use the same IPv6 address on the host (with SO_REUSEADDR set). At
> > Facebook, there are some internal teams that submit bind requests for
> > 400 vips on the same port on concurrent threads that run into softirq
> > lockup issues due to the bhash table entry spinlock contention, which
> > is the main motivation behind this patch.
>
> I am pretty sure the IPv6 stack does not scale well if we have
> thousands of IPv6 addresses on one netdev.
> Some O(N) behavior will also trigger latency violations.
>
> Can you share the test, in a form that can be added in linux tree ?
I will include it somewhere under testing/selftests/net - does that sound okay?
>
> I mean, before today nobody was trying to have 24k listeners on a host,
> so it would be nice to have a regression test for future changes in the stack.
>
> If the goal is to deal with 400 vips, why using 24k in your changelog ?
> I would rather stick to the reality, and not pretend TCP stack should
> scale to 24k listeners.
I chose 24k to test on because one of the internal team's usages is
binding from 80 workers for ~300 vips in parallel for the same port.
>
> I have not looked at the patch yet, I choked on the changelog for
> being exaggerated.
^ permalink raw reply
* Re: [PATCH] net: unexport csum_and_copy_{from,to}_user
From: Jakub Kicinski @ 2022-04-22 22:57 UTC (permalink / raw)
To: Christoph Hellwig
Cc: akpm, x86, linux-alpha, linux-m68k, linuxppc-dev, linux-kernel,
netdev
In-Reply-To: <20220421070440.1282704-1-hch@lst.de>
On Thu, 21 Apr 2022 09:04:40 +0200 Christoph Hellwig wrote:
> csum_and_copy_from_user and csum_and_copy_to_user are exported by
> a few architectures, but not actually used in modular code. Drop
> the exports.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Judging by the To: I presume the intention is for Andrew to take this
one, so FWIW:
Acked-by: Jakub Kicinski <kuba@kernel.org>
^ permalink raw reply
* Re: [PATCH net 0/2] wireguard patches for 5.18-rc4
From: Jakub Kicinski @ 2022-04-22 23:02 UTC (permalink / raw)
To: Jason A. Donenfeld; +Cc: netdev, davem
In-Reply-To: <20220421134805.279118-1-Jason@zx2c4.com>
On Thu, 21 Apr 2022 15:48:03 +0200 Jason A. Donenfeld wrote:
> Hi Davekub,
YES :D
> Here are two small wireguard fixes for 5.18-rc4:
Missed the PR, sadly. Paolo was handling it and he's on EU time.
^ permalink raw reply
* Re: [PATCH] NFC: nfcmrvl: fix error check return value of irq_of_parse_and_map()
From: Jakub Kicinski @ 2022-04-22 23:09 UTC (permalink / raw)
To: cgel.zte
Cc: krzysztof.kozlowski, davem, lv.ruyi, yashsri421, sameo, cuissard,
netdev, linux-kernel, Zeal Robot
In-Reply-To: <20220422084605.2775542-1-lv.ruyi@zte.com.cn>
On Fri, 22 Apr 2022 08:46:05 +0000 cgel.zte@gmail.com wrote:
> diff --git a/drivers/nfc/nfcmrvl/i2c.c b/drivers/nfc/nfcmrvl/i2c.c
> index ceef81d93ac9..7dcc97707363 100644
> --- a/drivers/nfc/nfcmrvl/i2c.c
> +++ b/drivers/nfc/nfcmrvl/i2c.c
> @@ -167,7 +167,7 @@ static int nfcmrvl_i2c_parse_dt(struct device_node *node,
> pdata->irq_polarity = IRQF_TRIGGER_RISING;
>
> ret = irq_of_parse_and_map(node, 0);
> - if (ret < 0) {
> + if (!ret) {
> pr_err("Unable to get irq, error: %d\n", ret);
> return ret;
If ret is guaranteed to be 0 in this branch now, why print it,
and how is it okay to return it from this function on error?
The usual low quality patch from the CGEL team :/
^ permalink raw reply
* Re: [patch iproute2-next] devlink: introduce -h[ex] cmdline option to allow dumping numbers in hex format
From: Stephen Hemminger @ 2022-04-22 23:10 UTC (permalink / raw)
To: Shannon Nelson; +Cc: Jiri Pirko, netdev, sthemmin, dsahern
In-Reply-To: <56b4d3e4-0274-10d8-0746-954750eac085@pensando.io>
On Fri, 22 Apr 2022 14:36:21 -0700
Shannon Nelson <snelson@pensando.io> wrote:
> > static int fmsg_value_show(struct dl *dl, int type, struct nlattr *nl_data)
> > {
> > + const char *num_fmt = dl->hex ? "%x" : "%u";
> > + const char *num64_fmt = dl->hex ? "%"PRIx64 : "%"PRIu64;
>
> Can we get a leading "0x" on these to help identify that they are hex
> digits?
Yes use %#x
^ permalink raw reply
* Re: [PATCH] nfc: nfcmrvl: spi: Fix irq_of_parse_and_map() return value
From: Jakub Kicinski @ 2022-04-22 23:13 UTC (permalink / raw)
To: Krzysztof Kozlowski
Cc: Vincent Cuissard, Samuel Ortiz, linux-nfc, netdev, linux-kernel
In-Reply-To: <20220422104758.64039-1-krzysztof.kozlowski@linaro.org>
On Fri, 22 Apr 2022 12:47:58 +0200 Krzysztof Kozlowski wrote:
> The irq_of_parse_and_map() returns 0 on failure, not a negative ERRNO.
>
> Fixes: caf6e49bf6d0 ("NFC: nfcmrvl: add spi driver")
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> ---
> This is another issue to https://lore.kernel.org/all/20220422084605.2775542-1-lv.ruyi@zte.com.cn/
Maybe send one patch that fixes both and also fixes the usage of ret?
> diff --git a/drivers/nfc/nfcmrvl/spi.c b/drivers/nfc/nfcmrvl/spi.c
> index a38e2fcdfd39..01f0a08a381c 100644
> --- a/drivers/nfc/nfcmrvl/spi.c
> +++ b/drivers/nfc/nfcmrvl/spi.c
> @@ -115,7 +115,7 @@ static int nfcmrvl_spi_parse_dt(struct device_node *node,
> }
>
> ret = irq_of_parse_and_map(node, 0);
> - if (ret < 0) {
> + if (!ret) {
> pr_err("Unable to get irq, error: %d\n", ret);
> return ret;
> }
^ permalink raw reply
* Re: [Patch net-next v3 0/2] add ethtool SQI support for LAN87xx T1 Phy
From: patchwork-bot+netdevbpf @ 2022-04-22 23:40 UTC (permalink / raw)
To: Arun Ramadoss
Cc: netdev, linux-kernel, pabeni, kuba, davem, linux, hkallweit1,
andrew, UNGLinuxDriver
In-Reply-To: <20220420152016.9680-1-arun.ramadoss@microchip.com>
Hello:
This series was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 20 Apr 2022 20:50:14 +0530 you wrote:
> This patch series add the Signal Quality Index measurement for the LAN87xx and
> LAN937x T1 phy. Updated the maintainers file for microchip_t1.c.
>
> v2 - v3
> ------
> Rebased to latest commit
>
> [...]
Here is the summary with links:
- [net-next,v3,1/2] net: phy: LAN87xx: add ethtool SQI support
https://git.kernel.org/netdev/net-next/c/b649695248b1
- [net-next,v3,2/2] MAINTAINERS: Add maintainers for Microchip T1 Phy driver
https://git.kernel.org/netdev/net-next/c/58f373f8d787
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH] net: ethernet: stmmac: fix write to sgmii_adapter_base
From: patchwork-bot+netdevbpf @ 2022-04-22 23:40 UTC (permalink / raw)
To: Dinh Nguyen; +Cc: davem, netdev, linux-kernel, stable
In-Reply-To: <20220420152345.27415-1-dinguyen@kernel.org>
Hello:
This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 20 Apr 2022 10:23:45 -0500 you wrote:
> I made a mistake with the commit a6aaa0032424 ("net: ethernet: stmmac:
> fix altr_tse_pcs function when using a fixed-link"). I should have
> tested against both scenario of having a SGMII interface and one
> without.
>
> Without the SGMII PCS TSE adpater, the sgmii_adapter_base address is
> NULL, thus a write to this address will fail.
>
> [...]
Here is the summary with links:
- net: ethernet: stmmac: fix write to sgmii_adapter_base
https://git.kernel.org/netdev/net/c/5fd1fe4807f9
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH][next] net: hns3: Fix spelling mistake "actvie" -> "active"
From: patchwork-bot+netdevbpf @ 2022-04-22 23:50 UTC (permalink / raw)
To: Colin Ian King
Cc: yisen.zhuang, salil.mehta, davem, kuba, pabeni, netdev,
kernel-janitors, linux-kernel
In-Reply-To: <20220421085546.321792-1-colin.i.king@gmail.com>
Hello:
This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 21 Apr 2022 09:55:46 +0100 you wrote:
> There is a spelling mistake in a netdev_info message. Fix it.
>
> Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
> ---
> drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Here is the summary with links:
- [next] net: hns3: Fix spelling mistake "actvie" -> "active"
https://git.kernel.org/netdev/net-next/c/31693d02b06e
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH] tsnep: Remove useless null check before call of_node_put()
From: patchwork-bot+netdevbpf @ 2022-04-22 23:50 UTC (permalink / raw)
To: Haowen Bai; +Cc: davem, kuba, pabeni, netdev, linux-kernel
In-Reply-To: <1650509283-26168-1-git-send-email-baihaowen@meizu.com>
Hello:
This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 21 Apr 2022 10:48:03 +0800 you wrote:
> No need to add null check before call of_node_put(), since the
> implementation of of_node_put() has done it.
>
> Signed-off-by: Haowen Bai <baihaowen@meizu.com>
> ---
> drivers/net/ethernet/engleder/tsnep_main.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
Here is the summary with links:
- tsnep: Remove useless null check before call of_node_put()
https://git.kernel.org/netdev/net-next/c/f28c47bb9fd3
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [patch iproute2-next] devlink: introduce -h[ex] cmdline option to allow dumping numbers in hex format
From: David Ahern @ 2022-04-23 0:20 UTC (permalink / raw)
To: Shannon Nelson, Jiri Pirko, netdev; +Cc: sthemmin
In-Reply-To: <56b4d3e4-0274-10d8-0746-954750eac085@pensando.io>
On 4/22/22 3:36 PM, Shannon Nelson wrote:
>> @@ -9053,6 +9056,7 @@ int main(int argc, char **argv)
>> { "statistics", no_argument, NULL, 's' },
>> { "Netns", required_argument, NULL, 'N' },
>> { "iec", no_argument, NULL, 'i' },
>> + { "hex", no_argument, NULL, 'h' },
>
> Can we use 'x' instead of 'h' here? Most times '-h' means 'help', and
> might surprise unsuspecting users when it isn't a help flag.
>
agreed. -h almost always means help
^ permalink raw reply
* Re: [PATCH] WireGuard: restrict packet handling to non-isolated CPUs.
From: Jason A. Donenfeld @ 2022-04-23 1:08 UTC (permalink / raw)
To: Charles-François Natali
Cc: wireguard, netdev, linux-crypto, Daniel Jordan, Steffen Klassert
In-Reply-To: <CAH_1eM2ECPKLcHAKQ-RNf4Zj5hrgT-aJ9pjTKfChf9fnZp5Vkw@mail.gmail.com>
Hi Charles,
On Fri, Apr 22, 2022 at 11:23:01PM +0100, Charles-François Natali wrote:
> > Regarding your patch, is there a way to make that a bit more succinct,
> > without introducing all of those helper functions? It seems awfully
> > verbose for something that seems like a matter of replacing the online
> > mask with the housekeeping mask.
>
> Indeed, I wasn't really happy about that.
> The reason I've written those helper functions is that the housekeeping mask
> includes possible CPUs (cpu_possible_mask), so unfortunately it's not just a
> matter of e.g. replacing cpu_online_mask with
> housekeeping_cpumask(HK_FLAG_DOMAIN), we have to perform an AND
> whenever we compute the weight, find the next CPU in the mask etc.
>
> And I'd rather have the operations and mask in a single location instead of
> scattered throughout the code, to make it easier to understand and maintain.
>
> Happy to change to something more inline though, or open to suggestions.
Probably more inlined, yea. A simpler version of your patch would
probably be something like this, right?
diff --git a/drivers/net/wireguard/queueing.h b/drivers/net/wireguard/queueing.h
index 583adb37ee1e..b3117cdd647d 100644
--- a/drivers/net/wireguard/queueing.h
+++ b/drivers/net/wireguard/queueing.h
@@ -112,6 +112,8 @@ static inline int wg_cpumask_choose_online(int *stored_cpu, unsigned int id)
cpu = cpumask_first(cpu_online_mask);
for (i = 0; i < cpu_index; ++i)
cpu = cpumask_next(cpu, cpu_online_mask);
+ while (!housekeeping_test_cpu(cpu, HK_???))
+ cpu = cpumask_next(cpu, cpu_online_mask);
*stored_cpu = cpu;
}
return cpu;
@@ -128,7 +130,7 @@ static inline int wg_cpumask_next_online(int *next)
{
int cpu = *next;
- while (unlikely(!cpumask_test_cpu(cpu, cpu_online_mask)))
+ while (unlikely(!cpumask_test_cpu(cpu, cpu_online_mask) && !housekeeping_test_cpu(cpu, HK_???)))
cpu = cpumask_next(cpu, cpu_online_mask) % nr_cpumask_bits;
*next = cpumask_next(cpu, cpu_online_mask) % nr_cpumask_bits;
return cpu;
However, from looking at kernel/sched/isolation.c a bit, I noticed that
indeed you're right that most of these functions (save one) are based on
cpu_possible_mask rather than cpu_online_mask. This is frustrating
because the code makes smart use of static branches to remain quick, but
ANDing housekeeping_cpumask() with cpu_online_mask would, in the fast
path, wind up ANDing cpu_online_mask with cpu_possible_mask, which is
silly and pointless. That makes me suspect that maybe the best approach
would be adding a relevant helper to kernel/sched/isolation.c, so that
the helper can then do the `if (static_branch_unlikely(&housekeeping_overridden))`
stuff internally.
Or maybe you'll do some measurements and decide that just [ab]using
housekeeping_test_cpu() like above is actually optimal? Not really sure
myself.
Anyway, I'll keep an eye out for your joint wireguard/padata series. Be
sure to CC the people who wrote the isolation & housekeeping code, as
they likely have opinions about this stuff (and certainly know more than
me about it).
Jason
^ permalink raw reply related
* RE: [PATCH net-next 2/4] net: stmmac: introduce PHY-less setup support
From: Ong, Boon Leong @ 2022-04-23 1:13 UTC (permalink / raw)
To: Andrew Lunn
Cc: Alexandre Torgue, Jose Abreu, Heiner Kallweit, Russell King,
Paolo Abeni, David S . Miller, Jakub Kicinski, Maxime Coquelin,
Alexandre Torgue, Giuseppe Cavallaro, netdev@vger.kernel.org,
linux-stm32@st-md-mailman.stormreply.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
In-Reply-To: <YmKmifSfqRdjOXSd@lunn.ch>
>What you need to do is extend your DSD to list the fixed-link. See
>
>https://www.kernel.org/doc/html/latest/firmware-
>guide/acpi/dsd/phy.html#mac-node-example-with-a-fixed-link-subnode
>
Thanks for the feedback. I will explore with the BIOS supplier to the project
on this.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox