* [PATCH net 0/2] mlx5 misc fixes 2025-05-22 @ 2025-05-22 6:28 Tariq Toukan 2025-05-22 6:28 ` [PATCH net 1/2] net/mlx5: Ensure fw pages are always allocated on same NUMA Tariq Toukan 2025-05-22 6:28 ` [PATCH net 2/2] net/mlx5e: Fix leak of Geneve TLV option object Tariq Toukan 0 siblings, 2 replies; 7+ messages in thread From: Tariq Toukan @ 2025-05-22 6:28 UTC (permalink / raw) To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn Cc: Saeed Mahameed, Tariq Toukan, Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch, Gal Pressman Hi, This small patchset provides misc bug fixes from the team to the mlx5 core and EN drivers. Thanks, Tariq. Jianbo Liu (1): net/mlx5e: Fix leak of Geneve TLV option object Moshe Shemesh (1): net/mlx5: Ensure fw pages are always allocated on same NUMA drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 7 ++++--- drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-) base-commit: 407e0efdf8baf1672876d5948b75049860a93e59 -- 2.31.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net 1/2] net/mlx5: Ensure fw pages are always allocated on same NUMA 2025-05-22 6:28 [PATCH net 0/2] mlx5 misc fixes 2025-05-22 Tariq Toukan @ 2025-05-22 6:28 ` Tariq Toukan 2025-05-22 8:57 ` Dawid Osuchowski 2025-05-22 6:28 ` [PATCH net 2/2] net/mlx5e: Fix leak of Geneve TLV option object Tariq Toukan 1 sibling, 1 reply; 7+ messages in thread From: Tariq Toukan @ 2025-05-22 6:28 UTC (permalink / raw) To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn Cc: Saeed Mahameed, Tariq Toukan, Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch, Gal Pressman From: Moshe Shemesh <moshe@nvidia.com> When firmware asks the driver to allocate more pages, using event of give_pages, the driver should always allocate it from same NUMA, the original device NUMA. Current code uses dev_to_node() which can result in different NUMA as it is changed by other driver flows, such as mlx5_dma_zalloc_coherent_node(). Instead, use saved numa node for allocating firmware pages. Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on reader NUMA node") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> --- drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c index 972e8e9df585..9bc9bd83c232 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c @@ -291,7 +291,7 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function) static int alloc_system_page(struct mlx5_core_dev *dev, u32 function) { struct device *device = mlx5_core_dma_dev(dev); - int nid = dev_to_node(device); + int nid = dev->priv.numa_node; struct page *page; u64 zero_addr = 1; u64 addr; -- 2.31.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net 1/2] net/mlx5: Ensure fw pages are always allocated on same NUMA 2025-05-22 6:28 ` [PATCH net 1/2] net/mlx5: Ensure fw pages are always allocated on same NUMA Tariq Toukan @ 2025-05-22 8:57 ` Dawid Osuchowski 0 siblings, 0 replies; 7+ messages in thread From: Dawid Osuchowski @ 2025-05-22 8:57 UTC (permalink / raw) To: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn Cc: Saeed Mahameed, Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch, Gal Pressman On 2025-05-22 8:28 AM, Tariq Toukan wrote: > From: Moshe Shemesh <moshe@nvidia.com> > > When firmware asks the driver to allocate more pages, using event of > give_pages, the driver should always allocate it from same NUMA, the > original device NUMA. Current code uses dev_to_node() which can result > in different NUMA as it is changed by other driver flows, such as > mlx5_dma_zalloc_coherent_node(). Instead, use saved numa node for > allocating firmware pages. > > Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on reader NUMA node") > Signed-off-by: Moshe Shemesh <moshe@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > index 972e8e9df585..9bc9bd83c232 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > @@ -291,7 +291,7 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function) > static int alloc_system_page(struct mlx5_core_dev *dev, u32 function) > { > struct device *device = mlx5_core_dma_dev(dev); > - int nid = dev_to_node(device); > + int nid = dev->priv.numa_node; Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Thanks, Dawid > struct page *page; > u64 zero_addr = 1; > u64 addr; ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net 2/2] net/mlx5e: Fix leak of Geneve TLV option object 2025-05-22 6:28 [PATCH net 0/2] mlx5 misc fixes 2025-05-22 Tariq Toukan 2025-05-22 6:28 ` [PATCH net 1/2] net/mlx5: Ensure fw pages are always allocated on same NUMA Tariq Toukan @ 2025-05-22 6:28 ` Tariq Toukan 2025-05-22 19:16 ` Simon Horman 1 sibling, 1 reply; 7+ messages in thread From: Tariq Toukan @ 2025-05-22 6:28 UTC (permalink / raw) To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn Cc: Saeed Mahameed, Tariq Toukan, Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch, Gal Pressman, Jianbo Liu From: Jianbo Liu <jianbol@nvidia.com> Previously, a unique tunnel id was added for the matching on TC non-zero chains, to support inner header rewrite with goto action. Later, it was used to support VF tunnel offload for vxlan, then for Geneve and GRE. To support VF tunnel, a temporary mlx5_flow_spec is used to parse tunnel options. For Geneve, if there is TLV option, a object is created, or refcnt is added if already exists. But the temporary mlx5_flow_spec is directly freed after parsing, which causes the leak because no information regarding the object is saved in flow's mlx5_flow_spec, which is used to free the object when deleting the flow. To fix the leak, call mlx5_geneve_tlv_option_del() before free the temporary spec if it has TLV object. Fixes: 521933cdc4aa ("net/mlx5e: Support Geneve and GRE with VF tunnel offload") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index f1d908f61134..b9c1d7f8f05c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -2028,9 +2028,8 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, return err; } -static bool mlx5_flow_has_geneve_opt(struct mlx5e_tc_flow *flow) +static bool mlx5_flow_has_geneve_opt(struct mlx5_flow_spec *spec) { - struct mlx5_flow_spec *spec = &flow->attr->parse_attr->spec; void *headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, misc_parameters_3); @@ -2069,7 +2068,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, } complete_all(&flow->del_hw_done); - if (mlx5_flow_has_geneve_opt(flow)) + if (mlx5_flow_has_geneve_opt(&attr->parse_attr->spec)) mlx5_geneve_tlv_option_del(priv->mdev->geneve); if (flow->decap_route) @@ -2580,6 +2579,8 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv, return err; } err = mlx5e_tc_set_attr_rx_tun(flow, tmp_spec); + if (mlx5_flow_has_geneve_opt(tmp_spec)) + mlx5_geneve_tlv_option_del(priv->mdev->geneve); kvfree(tmp_spec); if (err) return err; -- 2.31.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net 2/2] net/mlx5e: Fix leak of Geneve TLV option object 2025-05-22 6:28 ` [PATCH net 2/2] net/mlx5e: Fix leak of Geneve TLV option object Tariq Toukan @ 2025-05-22 19:16 ` Simon Horman 2025-05-23 1:58 ` Jianbo Liu 0 siblings, 1 reply; 7+ messages in thread From: Simon Horman @ 2025-05-22 19:16 UTC (permalink / raw) To: Tariq Toukan Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn, Saeed Mahameed, Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch, Gal Pressman, Jianbo Liu On Thu, May 22, 2025 at 09:28:06AM +0300, Tariq Toukan wrote: > From: Jianbo Liu <jianbol@nvidia.com> > > Previously, a unique tunnel id was added for the matching on TC > non-zero chains, to support inner header rewrite with goto action. > Later, it was used to support VF tunnel offload for vxlan, then for > Geneve and GRE. To support VF tunnel, a temporary mlx5_flow_spec is > used to parse tunnel options. For Geneve, if there is TLV option, a > object is created, or refcnt is added if already exists. But the > temporary mlx5_flow_spec is directly freed after parsing, which causes > the leak because no information regarding the object is saved in > flow's mlx5_flow_spec, which is used to free the object when deleting > the flow. > > To fix the leak, call mlx5_geneve_tlv_option_del() before free the > temporary spec if it has TLV object. > > Fixes: 521933cdc4aa ("net/mlx5e: Support Geneve and GRE with VF tunnel offload") > Signed-off-by: Jianbo Liu <jianbol@nvidia.com> > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > index f1d908f61134..b9c1d7f8f05c 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > @@ -2028,9 +2028,8 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, > return err; > } > > -static bool mlx5_flow_has_geneve_opt(struct mlx5e_tc_flow *flow) > +static bool mlx5_flow_has_geneve_opt(struct mlx5_flow_spec *spec) > { > - struct mlx5_flow_spec *spec = &flow->attr->parse_attr->spec; > void *headers_v = MLX5_ADDR_OF(fte_match_param, > spec->match_value, > misc_parameters_3); > @@ -2069,7 +2068,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, > } > complete_all(&flow->del_hw_done); > > - if (mlx5_flow_has_geneve_opt(flow)) > + if (mlx5_flow_has_geneve_opt(&attr->parse_attr->spec)) > mlx5_geneve_tlv_option_del(priv->mdev->geneve); > > if (flow->decap_route) Hi, The lines leading up to the hung below are: err = mlx5e_tc_tun_parse(filter_dev, priv, tmp_spec, f, match_level); if (err) { kvfree(tmp_spec); NL_SET_ERR_MSG_MOD(extack, "Failed to parse tunnel attributes"); netdev_warn(priv->netdev, "Failed to parse tunnel attributes"); I am wondering if the same resource leak described in the patch description can occur if mlx5e_tc_tun_parse() fails after it successfully calls tunnel->parse_tunnel(). > @@ -2580,6 +2579,8 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv, > return err; > } > err = mlx5e_tc_set_attr_rx_tun(flow, tmp_spec); > + if (mlx5_flow_has_geneve_opt(tmp_spec)) > + mlx5_geneve_tlv_option_del(priv->mdev->geneve); > kvfree(tmp_spec); > if (err) > return err; > -- > 2.31.1 > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net 2/2] net/mlx5e: Fix leak of Geneve TLV option object 2025-05-22 19:16 ` Simon Horman @ 2025-05-23 1:58 ` Jianbo Liu 2025-05-23 7:19 ` Simon Horman 0 siblings, 1 reply; 7+ messages in thread From: Jianbo Liu @ 2025-05-23 1:58 UTC (permalink / raw) To: Simon Horman, Tariq Toukan Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn, Saeed Mahameed, Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch, Gal Pressman On 5/23/2025 3:16 AM, Simon Horman wrote: > On Thu, May 22, 2025 at 09:28:06AM +0300, Tariq Toukan wrote: >> From: Jianbo Liu <jianbol@nvidia.com> >> >> Previously, a unique tunnel id was added for the matching on TC >> non-zero chains, to support inner header rewrite with goto action. >> Later, it was used to support VF tunnel offload for vxlan, then for >> Geneve and GRE. To support VF tunnel, a temporary mlx5_flow_spec is >> used to parse tunnel options. For Geneve, if there is TLV option, a >> object is created, or refcnt is added if already exists. But the >> temporary mlx5_flow_spec is directly freed after parsing, which causes >> the leak because no information regarding the object is saved in >> flow's mlx5_flow_spec, which is used to free the object when deleting >> the flow. >> >> To fix the leak, call mlx5_geneve_tlv_option_del() before free the >> temporary spec if it has TLV object. >> >> Fixes: 521933cdc4aa ("net/mlx5e: Support Geneve and GRE with VF tunnel offload") >> Signed-off-by: Jianbo Liu <jianbol@nvidia.com> >> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> >> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> >> --- >> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 7 ++++--- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c >> index f1d908f61134..b9c1d7f8f05c 100644 >> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c >> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c >> @@ -2028,9 +2028,8 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, >> return err; >> } >> >> -static bool mlx5_flow_has_geneve_opt(struct mlx5e_tc_flow *flow) >> +static bool mlx5_flow_has_geneve_opt(struct mlx5_flow_spec *spec) >> { >> - struct mlx5_flow_spec *spec = &flow->attr->parse_attr->spec; >> void *headers_v = MLX5_ADDR_OF(fte_match_param, >> spec->match_value, >> misc_parameters_3); >> @@ -2069,7 +2068,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, >> } >> complete_all(&flow->del_hw_done); >> >> - if (mlx5_flow_has_geneve_opt(flow)) >> + if (mlx5_flow_has_geneve_opt(&attr->parse_attr->spec)) >> mlx5_geneve_tlv_option_del(priv->mdev->geneve); >> >> if (flow->decap_route) > > Hi, > > The lines leading up to the hung below are: > > err = mlx5e_tc_tun_parse(filter_dev, priv, tmp_spec, f, match_level); > if (err) { > kvfree(tmp_spec); > NL_SET_ERR_MSG_MOD(extack, "Failed to parse tunnel attributes"); > netdev_warn(priv->netdev, "Failed to parse tunnel attributes"); > > I am wondering if the same resource leak described in the patch description > can occur if mlx5e_tc_tun_parse() fails after it successfully calls > tunnel->parse_tunnel(). > Yes, I missed that. I will fix in next version. Thanks! Jianbo >> @@ -2580,6 +2579,8 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv, >> return err; >> } >> err = mlx5e_tc_set_attr_rx_tun(flow, tmp_spec); >> + if (mlx5_flow_has_geneve_opt(tmp_spec)) >> + mlx5_geneve_tlv_option_del(priv->mdev->geneve); >> kvfree(tmp_spec); >> if (err) >> return err; >> -- >> 2.31.1 >> >> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net 2/2] net/mlx5e: Fix leak of Geneve TLV option object 2025-05-23 1:58 ` Jianbo Liu @ 2025-05-23 7:19 ` Simon Horman 0 siblings, 0 replies; 7+ messages in thread From: Simon Horman @ 2025-05-23 7:19 UTC (permalink / raw) To: Jianbo Liu Cc: Tariq Toukan, David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn, Saeed Mahameed, Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch, Gal Pressman On Fri, May 23, 2025 at 09:58:57AM +0800, Jianbo Liu wrote: > > > On 5/23/2025 3:16 AM, Simon Horman wrote: > > On Thu, May 22, 2025 at 09:28:06AM +0300, Tariq Toukan wrote: > > > From: Jianbo Liu <jianbol@nvidia.com> > > > > > > Previously, a unique tunnel id was added for the matching on TC > > > non-zero chains, to support inner header rewrite with goto action. > > > Later, it was used to support VF tunnel offload for vxlan, then for > > > Geneve and GRE. To support VF tunnel, a temporary mlx5_flow_spec is > > > used to parse tunnel options. For Geneve, if there is TLV option, a > > > object is created, or refcnt is added if already exists. But the > > > temporary mlx5_flow_spec is directly freed after parsing, which causes > > > the leak because no information regarding the object is saved in > > > flow's mlx5_flow_spec, which is used to free the object when deleting > > > the flow. > > > > > > To fix the leak, call mlx5_geneve_tlv_option_del() before free the > > > temporary spec if it has TLV object. > > > > > > Fixes: 521933cdc4aa ("net/mlx5e: Support Geneve and GRE with VF tunnel offload") > > > Signed-off-by: Jianbo Liu <jianbol@nvidia.com> > > > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> > > > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > > > --- > > > drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 7 ++++--- > > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > > index f1d908f61134..b9c1d7f8f05c 100644 > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > > @@ -2028,9 +2028,8 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, > > > return err; > > > } > > > -static bool mlx5_flow_has_geneve_opt(struct mlx5e_tc_flow *flow) > > > +static bool mlx5_flow_has_geneve_opt(struct mlx5_flow_spec *spec) > > > { > > > - struct mlx5_flow_spec *spec = &flow->attr->parse_attr->spec; > > > void *headers_v = MLX5_ADDR_OF(fte_match_param, > > > spec->match_value, > > > misc_parameters_3); > > > @@ -2069,7 +2068,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv, > > > } > > > complete_all(&flow->del_hw_done); > > > - if (mlx5_flow_has_geneve_opt(flow)) > > > + if (mlx5_flow_has_geneve_opt(&attr->parse_attr->spec)) > > > mlx5_geneve_tlv_option_del(priv->mdev->geneve); > > > if (flow->decap_route) > > > > Hi, > > > > The lines leading up to the hung below are: > > > > err = mlx5e_tc_tun_parse(filter_dev, priv, tmp_spec, f, match_level); > > if (err) { > > kvfree(tmp_spec); > > NL_SET_ERR_MSG_MOD(extack, "Failed to parse tunnel attributes"); > > netdev_warn(priv->netdev, "Failed to parse tunnel attributes"); > > > > I am wondering if the same resource leak described in the patch description > > can occur if mlx5e_tc_tun_parse() fails after it successfully calls > > tunnel->parse_tunnel(). > > > > Yes, I missed that. I will fix in next version. Thanks, much appreciated. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-05-23 7:19 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-22 6:28 [PATCH net 0/2] mlx5 misc fixes 2025-05-22 Tariq Toukan 2025-05-22 6:28 ` [PATCH net 1/2] net/mlx5: Ensure fw pages are always allocated on same NUMA Tariq Toukan 2025-05-22 8:57 ` Dawid Osuchowski 2025-05-22 6:28 ` [PATCH net 2/2] net/mlx5e: Fix leak of Geneve TLV option object Tariq Toukan 2025-05-22 19:16 ` Simon Horman 2025-05-23 1:58 ` Jianbo Liu 2025-05-23 7:19 ` Simon Horman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).