* [PATCH net-next] net/smc: add support for netdevice in containers. @ 2023-09-25 2:35 Albert Huang 2023-09-26 10:48 ` Leon Romanovsky ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Albert Huang @ 2023-09-25 2:35 UTC (permalink / raw) To: Karsten Graul, Wenjia Zhang, Jan Karcher Cc: Albert Huang, D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel If the netdevice is within a container and communicates externally through network technologies like VXLAN, we won't be able to find routing information in the init_net namespace. To address this issue, we need to add a struct net parameter to the smc_ib_find_route function. This allow us to locate the routing information within the corresponding net namespace, ensuring the correct completion of the SMC CLC interaction. Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> --- net/smc/af_smc.c | 3 ++- net/smc/smc_ib.c | 7 ++++--- net/smc/smc_ib.h | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index bacdd971615e..7a874da90c7f 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -1201,6 +1201,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, (struct smc_clc_msg_accept_confirm_v2 *)aclc; struct smc_clc_first_contact_ext *fce = smc_get_clc_first_contact_ext(clc_v2, false); + struct net *net = sock_net(&smc->sk); int rc; if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) @@ -1210,7 +1211,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); ini->smcrv2.uses_gateway = false; } else { - if (smc_ib_find_route(smc->clcsock->sk->sk_rcv_saddr, + if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), ini->smcrv2.nexthop_mac, &ini->smcrv2.uses_gateway)) diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c index 9b66d6aeeb1a..89981dbe46c9 100644 --- a/net/smc/smc_ib.c +++ b/net/smc/smc_ib.c @@ -193,7 +193,7 @@ bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; } -int smc_ib_find_route(__be32 saddr, __be32 daddr, +int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, u8 nexthop_mac[], u8 *uses_gateway) { struct neighbour *neigh = NULL; @@ -205,7 +205,7 @@ int smc_ib_find_route(__be32 saddr, __be32 daddr, if (daddr == cpu_to_be32(INADDR_NONE)) goto out; - rt = ip_route_output_flow(&init_net, &fl4, NULL); + rt = ip_route_output_flow(net, &fl4, NULL); if (IS_ERR(rt)) goto out; if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) @@ -235,6 +235,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, if (smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP && smc_ib_gid_to_ipv4((u8 *)&attr->gid) != cpu_to_be32(INADDR_NONE)) { struct in_device *in_dev = __in_dev_get_rcu(ndev); + struct net *net = dev_net(ndev); const struct in_ifaddr *ifa; bool subnet_match = false; @@ -248,7 +249,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, } if (!subnet_match) goto out; - if (smcrv2->daddr && smc_ib_find_route(smcrv2->saddr, + if (smcrv2->daddr && smc_ib_find_route(net, smcrv2->saddr, smcrv2->daddr, smcrv2->nexthop_mac, &smcrv2->uses_gateway)) diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h index 4df5f8c8a0a1..ef8ac2b7546d 100644 --- a/net/smc/smc_ib.h +++ b/net/smc/smc_ib.h @@ -112,7 +112,7 @@ void smc_ib_sync_sg_for_device(struct smc_link *lnk, int smc_ib_determine_gid(struct smc_ib_device *smcibdev, u8 ibport, unsigned short vlan_id, u8 gid[], u8 *sgid_index, struct smc_init_info_smcrv2 *smcrv2); -int smc_ib_find_route(__be32 saddr, __be32 daddr, +int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, u8 nexthop_mac[], u8 *uses_gateway); bool smc_ib_is_valid_local_systemid(void); int smcr_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb); -- 2.37.1 (Apple Git-137.1) ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-25 2:35 [PATCH net-next] net/smc: add support for netdevice in containers Albert Huang @ 2023-09-26 10:48 ` Leon Romanovsky 2023-09-26 11:14 ` Alexandra Winter 2023-09-27 3:42 ` Dust Li 2023-09-28 15:04 ` Niklas Schnelle 2 siblings, 1 reply; 18+ messages in thread From: Leon Romanovsky @ 2023-09-26 10:48 UTC (permalink / raw) To: Albert Huang Cc: Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel, RDMA mailing list, Jason Gunthorpe On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: > If the netdevice is within a container and communicates externally > through network technologies like VXLAN, we won't be able to find > routing information in the init_net namespace. To address this issue, > we need to add a struct net parameter to the smc_ib_find_route function. > This allow us to locate the routing information within the corresponding > net namespace, ensuring the correct completion of the SMC CLC interaction. > > Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> > --- > net/smc/af_smc.c | 3 ++- > net/smc/smc_ib.c | 7 ++++--- > net/smc/smc_ib.h | 2 +- > 3 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > index bacdd971615e..7a874da90c7f 100644 > --- a/net/smc/af_smc.c > +++ b/net/smc/af_smc.c > @@ -1201,6 +1201,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > (struct smc_clc_msg_accept_confirm_v2 *)aclc; > struct smc_clc_first_contact_ext *fce = > smc_get_clc_first_contact_ext(clc_v2, false); > + struct net *net = sock_net(&smc->sk); > int rc; > > if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) > @@ -1210,7 +1211,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); > ini->smcrv2.uses_gateway = false; > } else { > - if (smc_ib_find_route(smc->clcsock->sk->sk_rcv_saddr, > + if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, > smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), > ini->smcrv2.nexthop_mac, > &ini->smcrv2.uses_gateway)) > diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c > index 9b66d6aeeb1a..89981dbe46c9 100644 > --- a/net/smc/smc_ib.c > +++ b/net/smc/smc_ib.c > @@ -193,7 +193,7 @@ bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) > return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; > } > > -int smc_ib_find_route(__be32 saddr, __be32 daddr, > +int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > u8 nexthop_mac[], u8 *uses_gateway) > { > struct neighbour *neigh = NULL; > @@ -205,7 +205,7 @@ int smc_ib_find_route(__be32 saddr, __be32 daddr, > > if (daddr == cpu_to_be32(INADDR_NONE)) > goto out; > - rt = ip_route_output_flow(&init_net, &fl4, NULL); > + rt = ip_route_output_flow(net, &fl4, NULL); This patch made me wonder, why doesn't SMC use RDMA-CM like all other in-kernel ULPs which work over RDMA? Thanks ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-26 10:48 ` Leon Romanovsky @ 2023-09-26 11:14 ` Alexandra Winter 2023-09-26 11:41 ` Leon Romanovsky 0 siblings, 1 reply; 18+ messages in thread From: Alexandra Winter @ 2023-09-26 11:14 UTC (permalink / raw) To: Leon Romanovsky, Albert Huang Cc: Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel, RDMA mailing list, Jason Gunthorpe On 26.09.23 12:48, Leon Romanovsky wrote: > This patch made me wonder, why doesn't SMC use RDMA-CM like all other > in-kernel ULPs which work over RDMA? > > Thanks The idea behind SMC is that it should look an feel to the applications like TCP sockets. So for connection management it uses TCP over IP; RDMA is just used for the data transfer. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-26 11:14 ` Alexandra Winter @ 2023-09-26 11:41 ` Leon Romanovsky 2023-09-26 12:09 ` Dust Li 0 siblings, 1 reply; 18+ messages in thread From: Leon Romanovsky @ 2023-09-26 11:41 UTC (permalink / raw) To: Alexandra Winter Cc: Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel, RDMA mailing list, Jason Gunthorpe On Tue, Sep 26, 2023 at 01:14:04PM +0200, Alexandra Winter wrote: > > > On 26.09.23 12:48, Leon Romanovsky wrote: > > This patch made me wonder, why doesn't SMC use RDMA-CM like all other > > in-kernel ULPs which work over RDMA? > > > > Thanks > > The idea behind SMC is that it should look an feel to the applications > like TCP sockets. So for connection management it uses TCP over IP; > RDMA is just used for the data transfer. I think that it is not different from other ULPs. For example, RDS works over sockets and doesn't touch or reimplement GID management logic. Thanks ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-26 11:41 ` Leon Romanovsky @ 2023-09-26 12:09 ` Dust Li 2023-09-26 17:30 ` Leon Romanovsky 0 siblings, 1 reply; 18+ messages in thread From: Dust Li @ 2023-09-26 12:09 UTC (permalink / raw) To: Leon Romanovsky, Alexandra Winter Cc: Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel, RDMA mailing list, Jason Gunthorpe On Tue, Sep 26, 2023 at 02:41:04PM +0300, Leon Romanovsky wrote: >On Tue, Sep 26, 2023 at 01:14:04PM +0200, Alexandra Winter wrote: >> >> >> On 26.09.23 12:48, Leon Romanovsky wrote: >> > This patch made me wonder, why doesn't SMC use RDMA-CM like all other >> > in-kernel ULPs which work over RDMA? >> > >> > Thanks >> >> The idea behind SMC is that it should look an feel to the applications >> like TCP sockets. So for connection management it uses TCP over IP; >> RDMA is just used for the data transfer. > >I think that it is not different from other ULPs. For example, RDS works >over sockets and doesn't touch or reimplement GID management logic. I think the difference is SMC socket need to be compatible with TCP socket, so it need a tcp socket to fallback when something is not working. If SMC works with rdmacm, it still need a fallback-to-tcp socket, and the tcp connection has to be established for each SMC socket before the SMC socket got established, that would make rdmacm meaningless. Best regards, Dust > >Thanks ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-26 12:09 ` Dust Li @ 2023-09-26 17:30 ` Leon Romanovsky 0 siblings, 0 replies; 18+ messages in thread From: Leon Romanovsky @ 2023-09-26 17:30 UTC (permalink / raw) To: Dust Li Cc: Alexandra Winter, Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel, RDMA mailing list, Jason Gunthorpe On Tue, Sep 26, 2023 at 08:09:03PM +0800, Dust Li wrote: > On Tue, Sep 26, 2023 at 02:41:04PM +0300, Leon Romanovsky wrote: > >On Tue, Sep 26, 2023 at 01:14:04PM +0200, Alexandra Winter wrote: > >> > >> > >> On 26.09.23 12:48, Leon Romanovsky wrote: > >> > This patch made me wonder, why doesn't SMC use RDMA-CM like all other > >> > in-kernel ULPs which work over RDMA? > >> > > >> > Thanks > >> > >> The idea behind SMC is that it should look an feel to the applications > >> like TCP sockets. So for connection management it uses TCP over IP; > >> RDMA is just used for the data transfer. > > > >I think that it is not different from other ULPs. For example, RDS works > >over sockets and doesn't touch or reimplement GID management logic. > > I think the difference is SMC socket need to be compatible with TCP > socket, so it need a tcp socket to fallback when something is not working. > > If SMC works with rdmacm, it still need a fallback-to-tcp socket, and > the tcp connection has to be established for each SMC socket before the > SMC socket got established, that would make rdmacm meaningless. You still need to perform device-GID-route translations [1], which sounds to me very RDMA-CM. I'm not asking you to rewrite the code, but trying to get rationale behind reimplementing part of RDMA subsystem. Thanks [1] 24fb68111d45 ("net/smc: retrieve v2 gid from IB device") > > Best regards, > Dust > > > > >Thanks ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-25 2:35 [PATCH net-next] net/smc: add support for netdevice in containers Albert Huang 2023-09-26 10:48 ` Leon Romanovsky @ 2023-09-27 3:42 ` Dust Li 2023-09-27 5:55 ` Leon Romanovsky 2023-10-03 10:41 ` Paolo Abeni 2023-09-28 15:04 ` Niklas Schnelle 2 siblings, 2 replies; 18+ messages in thread From: Dust Li @ 2023-09-27 3:42 UTC (permalink / raw) To: Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher Cc: D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: >If the netdevice is within a container and communicates externally >through network technologies like VXLAN, we won't be able to find >routing information in the init_net namespace. To address this issue, Thanks for your founding ! I think this is a more generic problem, but not just related to VXLAN ? If we use SMC-R v2 and the netdevice is in a net namespace which is not init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. Best regards, Dust >we need to add a struct net parameter to the smc_ib_find_route function. >This allow us to locate the routing information within the corresponding >net namespace, ensuring the correct completion of the SMC CLC interaction. > >Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> >--- > net/smc/af_smc.c | 3 ++- > net/smc/smc_ib.c | 7 ++++--- > net/smc/smc_ib.h | 2 +- > 3 files changed, 7 insertions(+), 5 deletions(-) > >diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >index bacdd971615e..7a874da90c7f 100644 >--- a/net/smc/af_smc.c >+++ b/net/smc/af_smc.c >@@ -1201,6 +1201,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > (struct smc_clc_msg_accept_confirm_v2 *)aclc; > struct smc_clc_first_contact_ext *fce = > smc_get_clc_first_contact_ext(clc_v2, false); >+ struct net *net = sock_net(&smc->sk); > int rc; > > if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) >@@ -1210,7 +1211,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); > ini->smcrv2.uses_gateway = false; > } else { >- if (smc_ib_find_route(smc->clcsock->sk->sk_rcv_saddr, >+ if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, > smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), > ini->smcrv2.nexthop_mac, > &ini->smcrv2.uses_gateway)) >diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c >index 9b66d6aeeb1a..89981dbe46c9 100644 >--- a/net/smc/smc_ib.c >+++ b/net/smc/smc_ib.c >@@ -193,7 +193,7 @@ bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) > return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; > } > >-int smc_ib_find_route(__be32 saddr, __be32 daddr, >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > u8 nexthop_mac[], u8 *uses_gateway) > { > struct neighbour *neigh = NULL; >@@ -205,7 +205,7 @@ int smc_ib_find_route(__be32 saddr, __be32 daddr, > > if (daddr == cpu_to_be32(INADDR_NONE)) > goto out; >- rt = ip_route_output_flow(&init_net, &fl4, NULL); >+ rt = ip_route_output_flow(net, &fl4, NULL); > if (IS_ERR(rt)) > goto out; > if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) >@@ -235,6 +235,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, > if (smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP && > smc_ib_gid_to_ipv4((u8 *)&attr->gid) != cpu_to_be32(INADDR_NONE)) { > struct in_device *in_dev = __in_dev_get_rcu(ndev); >+ struct net *net = dev_net(ndev); > const struct in_ifaddr *ifa; > bool subnet_match = false; > >@@ -248,7 +249,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, > } > if (!subnet_match) > goto out; >- if (smcrv2->daddr && smc_ib_find_route(smcrv2->saddr, >+ if (smcrv2->daddr && smc_ib_find_route(net, smcrv2->saddr, > smcrv2->daddr, > smcrv2->nexthop_mac, > &smcrv2->uses_gateway)) >diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h >index 4df5f8c8a0a1..ef8ac2b7546d 100644 >--- a/net/smc/smc_ib.h >+++ b/net/smc/smc_ib.h >@@ -112,7 +112,7 @@ void smc_ib_sync_sg_for_device(struct smc_link *lnk, > int smc_ib_determine_gid(struct smc_ib_device *smcibdev, u8 ibport, > unsigned short vlan_id, u8 gid[], u8 *sgid_index, > struct smc_init_info_smcrv2 *smcrv2); >-int smc_ib_find_route(__be32 saddr, __be32 daddr, >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > u8 nexthop_mac[], u8 *uses_gateway); > bool smc_ib_is_valid_local_systemid(void); > int smcr_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb); >-- >2.37.1 (Apple Git-137.1) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-27 3:42 ` Dust Li @ 2023-09-27 5:55 ` Leon Romanovsky 2023-09-27 12:17 ` Dust Li 2023-09-28 3:11 ` [External] " 黄杰 2023-10-03 10:41 ` Paolo Abeni 1 sibling, 2 replies; 18+ messages in thread From: Leon Romanovsky @ 2023-09-27 5:55 UTC (permalink / raw) To: Dust Li Cc: Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel On Wed, Sep 27, 2023 at 11:42:09AM +0800, Dust Li wrote: > On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: > >If the netdevice is within a container and communicates externally > >through network technologies like VXLAN, we won't be able to find > >routing information in the init_net namespace. To address this issue, > > Thanks for your founding ! > > I think this is a more generic problem, but not just related to VXLAN ? > If we use SMC-R v2 and the netdevice is in a net namespace which is not > init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. BTW, does this patch take into account net namespace of ib_device? Thanks > > Best regards, > Dust > > >we need to add a struct net parameter to the smc_ib_find_route function. > >This allow us to locate the routing information within the corresponding > >net namespace, ensuring the correct completion of the SMC CLC interaction. > > > >Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> > >--- > > net/smc/af_smc.c | 3 ++- > > net/smc/smc_ib.c | 7 ++++--- > > net/smc/smc_ib.h | 2 +- > > 3 files changed, 7 insertions(+), 5 deletions(-) > > > >diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > >index bacdd971615e..7a874da90c7f 100644 > >--- a/net/smc/af_smc.c > >+++ b/net/smc/af_smc.c > >@@ -1201,6 +1201,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > > (struct smc_clc_msg_accept_confirm_v2 *)aclc; > > struct smc_clc_first_contact_ext *fce = > > smc_get_clc_first_contact_ext(clc_v2, false); > >+ struct net *net = sock_net(&smc->sk); > > int rc; > > > > if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) > >@@ -1210,7 +1211,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > > memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); > > ini->smcrv2.uses_gateway = false; > > } else { > >- if (smc_ib_find_route(smc->clcsock->sk->sk_rcv_saddr, > >+ if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, > > smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), > > ini->smcrv2.nexthop_mac, > > &ini->smcrv2.uses_gateway)) > >diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c > >index 9b66d6aeeb1a..89981dbe46c9 100644 > >--- a/net/smc/smc_ib.c > >+++ b/net/smc/smc_ib.c > >@@ -193,7 +193,7 @@ bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) > > return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; > > } > > > >-int smc_ib_find_route(__be32 saddr, __be32 daddr, > >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > > u8 nexthop_mac[], u8 *uses_gateway) > > { > > struct neighbour *neigh = NULL; > >@@ -205,7 +205,7 @@ int smc_ib_find_route(__be32 saddr, __be32 daddr, > > > > if (daddr == cpu_to_be32(INADDR_NONE)) > > goto out; > >- rt = ip_route_output_flow(&init_net, &fl4, NULL); > >+ rt = ip_route_output_flow(net, &fl4, NULL); > > if (IS_ERR(rt)) > > goto out; > > if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) > >@@ -235,6 +235,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, > > if (smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP && > > smc_ib_gid_to_ipv4((u8 *)&attr->gid) != cpu_to_be32(INADDR_NONE)) { > > struct in_device *in_dev = __in_dev_get_rcu(ndev); > >+ struct net *net = dev_net(ndev); > > const struct in_ifaddr *ifa; > > bool subnet_match = false; > > > >@@ -248,7 +249,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, > > } > > if (!subnet_match) > > goto out; > >- if (smcrv2->daddr && smc_ib_find_route(smcrv2->saddr, > >+ if (smcrv2->daddr && smc_ib_find_route(net, smcrv2->saddr, > > smcrv2->daddr, > > smcrv2->nexthop_mac, > > &smcrv2->uses_gateway)) > >diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h > >index 4df5f8c8a0a1..ef8ac2b7546d 100644 > >--- a/net/smc/smc_ib.h > >+++ b/net/smc/smc_ib.h > >@@ -112,7 +112,7 @@ void smc_ib_sync_sg_for_device(struct smc_link *lnk, > > int smc_ib_determine_gid(struct smc_ib_device *smcibdev, u8 ibport, > > unsigned short vlan_id, u8 gid[], u8 *sgid_index, > > struct smc_init_info_smcrv2 *smcrv2); > >-int smc_ib_find_route(__be32 saddr, __be32 daddr, > >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > > u8 nexthop_mac[], u8 *uses_gateway); > > bool smc_ib_is_valid_local_systemid(void); > > int smcr_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb); > >-- > >2.37.1 (Apple Git-137.1) > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-27 5:55 ` Leon Romanovsky @ 2023-09-27 12:17 ` Dust Li 2023-09-28 9:51 ` Leon Romanovsky 2023-09-28 3:11 ` [External] " 黄杰 1 sibling, 1 reply; 18+ messages in thread From: Dust Li @ 2023-09-27 12:17 UTC (permalink / raw) To: Leon Romanovsky Cc: Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel On Wed, Sep 27, 2023 at 08:55:28AM +0300, Leon Romanovsky wrote: >On Wed, Sep 27, 2023 at 11:42:09AM +0800, Dust Li wrote: >> On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: >> >If the netdevice is within a container and communicates externally >> >through network technologies like VXLAN, we won't be able to find >> >routing information in the init_net namespace. To address this issue, >> >> Thanks for your founding ! >> >> I think this is a more generic problem, but not just related to VXLAN ? >> If we use SMC-R v2 and the netdevice is in a net namespace which is not >> init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. > >BTW, does this patch take into account net namespace of ib_device? I think this patch is irrelevant with the netns of ib_device. SMC has a global smc_ib_devices list reported by ib_client, and checked the netns using rdma_dev_access_netns. So I think we should have handled that well. Best regards, Dust > >Thanks > >> >> Best regards, >> Dust >> >> >we need to add a struct net parameter to the smc_ib_find_route function. >> >This allow us to locate the routing information within the corresponding >> >net namespace, ensuring the correct completion of the SMC CLC interaction. >> > >> >Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> >> >--- >> > net/smc/af_smc.c | 3 ++- >> > net/smc/smc_ib.c | 7 ++++--- >> > net/smc/smc_ib.h | 2 +- >> > 3 files changed, 7 insertions(+), 5 deletions(-) >> > >> >diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >> >index bacdd971615e..7a874da90c7f 100644 >> >--- a/net/smc/af_smc.c >> >+++ b/net/smc/af_smc.c >> >@@ -1201,6 +1201,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, >> > (struct smc_clc_msg_accept_confirm_v2 *)aclc; >> > struct smc_clc_first_contact_ext *fce = >> > smc_get_clc_first_contact_ext(clc_v2, false); >> >+ struct net *net = sock_net(&smc->sk); >> > int rc; >> > >> > if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) >> >@@ -1210,7 +1211,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, >> > memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); >> > ini->smcrv2.uses_gateway = false; >> > } else { >> >- if (smc_ib_find_route(smc->clcsock->sk->sk_rcv_saddr, >> >+ if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, >> > smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), >> > ini->smcrv2.nexthop_mac, >> > &ini->smcrv2.uses_gateway)) >> >diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c >> >index 9b66d6aeeb1a..89981dbe46c9 100644 >> >--- a/net/smc/smc_ib.c >> >+++ b/net/smc/smc_ib.c >> >@@ -193,7 +193,7 @@ bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) >> > return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; >> > } >> > >> >-int smc_ib_find_route(__be32 saddr, __be32 daddr, >> >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, >> > u8 nexthop_mac[], u8 *uses_gateway) >> > { >> > struct neighbour *neigh = NULL; >> >@@ -205,7 +205,7 @@ int smc_ib_find_route(__be32 saddr, __be32 daddr, >> > >> > if (daddr == cpu_to_be32(INADDR_NONE)) >> > goto out; >> >- rt = ip_route_output_flow(&init_net, &fl4, NULL); >> >+ rt = ip_route_output_flow(net, &fl4, NULL); >> > if (IS_ERR(rt)) >> > goto out; >> > if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) >> >@@ -235,6 +235,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, >> > if (smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP && >> > smc_ib_gid_to_ipv4((u8 *)&attr->gid) != cpu_to_be32(INADDR_NONE)) { >> > struct in_device *in_dev = __in_dev_get_rcu(ndev); >> >+ struct net *net = dev_net(ndev); >> > const struct in_ifaddr *ifa; >> > bool subnet_match = false; >> > >> >@@ -248,7 +249,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, >> > } >> > if (!subnet_match) >> > goto out; >> >- if (smcrv2->daddr && smc_ib_find_route(smcrv2->saddr, >> >+ if (smcrv2->daddr && smc_ib_find_route(net, smcrv2->saddr, >> > smcrv2->daddr, >> > smcrv2->nexthop_mac, >> > &smcrv2->uses_gateway)) >> >diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h >> >index 4df5f8c8a0a1..ef8ac2b7546d 100644 >> >--- a/net/smc/smc_ib.h >> >+++ b/net/smc/smc_ib.h >> >@@ -112,7 +112,7 @@ void smc_ib_sync_sg_for_device(struct smc_link *lnk, >> > int smc_ib_determine_gid(struct smc_ib_device *smcibdev, u8 ibport, >> > unsigned short vlan_id, u8 gid[], u8 *sgid_index, >> > struct smc_init_info_smcrv2 *smcrv2); >> >-int smc_ib_find_route(__be32 saddr, __be32 daddr, >> >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, >> > u8 nexthop_mac[], u8 *uses_gateway); >> > bool smc_ib_is_valid_local_systemid(void); >> > int smcr_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb); >> >-- >> >2.37.1 (Apple Git-137.1) >> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-27 12:17 ` Dust Li @ 2023-09-28 9:51 ` Leon Romanovsky 0 siblings, 0 replies; 18+ messages in thread From: Leon Romanovsky @ 2023-09-28 9:51 UTC (permalink / raw) To: Dust Li Cc: Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel On Wed, Sep 27, 2023 at 08:17:40PM +0800, Dust Li wrote: > On Wed, Sep 27, 2023 at 08:55:28AM +0300, Leon Romanovsky wrote: > >On Wed, Sep 27, 2023 at 11:42:09AM +0800, Dust Li wrote: > >> On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: > >> >If the netdevice is within a container and communicates externally > >> >through network technologies like VXLAN, we won't be able to find > >> >routing information in the init_net namespace. To address this issue, > >> > >> Thanks for your founding ! > >> > >> I think this is a more generic problem, but not just related to VXLAN ? > >> If we use SMC-R v2 and the netdevice is in a net namespace which is not > >> init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. > > > >BTW, does this patch take into account net namespace of ib_device? > > I think this patch is irrelevant with the netns of ib_device. > > SMC has a global smc_ib_devices list reported by ib_client, and checked > the netns using rdma_dev_access_netns. So I think we should have handled > that well. ok, I see Thanks, Reviewed-by: Leon Romanovsky <leonro@nvidia.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [External] Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-27 5:55 ` Leon Romanovsky 2023-09-27 12:17 ` Dust Li @ 2023-09-28 3:11 ` 黄杰 1 sibling, 0 replies; 18+ messages in thread From: 黄杰 @ 2023-09-28 3:11 UTC (permalink / raw) To: Leon Romanovsky Cc: Dust Li, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel Leon Romanovsky <leon@kernel.org> 于2023年9月27日周三 13:55写道: > > On Wed, Sep 27, 2023 at 11:42:09AM +0800, Dust Li wrote: > > On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: > > >If the netdevice is within a container and communicates externally > > >through network technologies like VXLAN, we won't be able to find > > >routing information in the init_net namespace. To address this issue, > > > > Thanks for your founding ! > > > > I think this is a more generic problem, but not just related to VXLAN ? > > If we use SMC-R v2 and the netdevice is in a net namespace which is not > > init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. > > BTW, does this patch take into account net namespace of ib_device? > > Thanks > As dust said, the ib_device works well. bool rdma_dev_access_netns(const struct ib_device *dev, const struct net *net) { return (ib_devices_shared_netns || net_eq(read_pnet(&dev->coredev.rdma_net), net)); } EXPORT_SYMBOL(rdma_dev_access_netns); thanks! BR Albert. > > > > Best regards, > > Dust > > > > >we need to add a struct net parameter to the smc_ib_find_route function. > > >This allow us to locate the routing information within the corresponding > > >net namespace, ensuring the correct completion of the SMC CLC interaction. > > > > > >Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> > > >--- > > > net/smc/af_smc.c | 3 ++- > > > net/smc/smc_ib.c | 7 ++++--- > > > net/smc/smc_ib.h | 2 +- > > > 3 files changed, 7 insertions(+), 5 deletions(-) > > > > > >diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > > >index bacdd971615e..7a874da90c7f 100644 > > >--- a/net/smc/af_smc.c > > >+++ b/net/smc/af_smc.c > > >@@ -1201,6 +1201,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > > > (struct smc_clc_msg_accept_confirm_v2 *)aclc; > > > struct smc_clc_first_contact_ext *fce = > > > smc_get_clc_first_contact_ext(clc_v2, false); > > >+ struct net *net = sock_net(&smc->sk); > > > int rc; > > > > > > if (!ini->first_contact_peer || aclc->hdr.version == SMC_V1) > > >@@ -1210,7 +1211,7 @@ static int smc_connect_rdma_v2_prepare(struct smc_sock *smc, > > > memcpy(ini->smcrv2.nexthop_mac, &aclc->r0.lcl.mac, ETH_ALEN); > > > ini->smcrv2.uses_gateway = false; > > > } else { > > >- if (smc_ib_find_route(smc->clcsock->sk->sk_rcv_saddr, > > >+ if (smc_ib_find_route(net, smc->clcsock->sk->sk_rcv_saddr, > > > smc_ib_gid_to_ipv4(aclc->r0.lcl.gid), > > > ini->smcrv2.nexthop_mac, > > > &ini->smcrv2.uses_gateway)) > > >diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c > > >index 9b66d6aeeb1a..89981dbe46c9 100644 > > >--- a/net/smc/smc_ib.c > > >+++ b/net/smc/smc_ib.c > > >@@ -193,7 +193,7 @@ bool smc_ib_port_active(struct smc_ib_device *smcibdev, u8 ibport) > > > return smcibdev->pattr[ibport - 1].state == IB_PORT_ACTIVE; > > > } > > > > > >-int smc_ib_find_route(__be32 saddr, __be32 daddr, > > >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > > > u8 nexthop_mac[], u8 *uses_gateway) > > > { > > > struct neighbour *neigh = NULL; > > >@@ -205,7 +205,7 @@ int smc_ib_find_route(__be32 saddr, __be32 daddr, > > > > > > if (daddr == cpu_to_be32(INADDR_NONE)) > > > goto out; > > >- rt = ip_route_output_flow(&init_net, &fl4, NULL); > > >+ rt = ip_route_output_flow(net, &fl4, NULL); > > > if (IS_ERR(rt)) > > > goto out; > > > if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) > > >@@ -235,6 +235,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, > > > if (smcrv2 && attr->gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP && > > > smc_ib_gid_to_ipv4((u8 *)&attr->gid) != cpu_to_be32(INADDR_NONE)) { > > > struct in_device *in_dev = __in_dev_get_rcu(ndev); > > >+ struct net *net = dev_net(ndev); > > > const struct in_ifaddr *ifa; > > > bool subnet_match = false; > > > > > >@@ -248,7 +249,7 @@ static int smc_ib_determine_gid_rcu(const struct net_device *ndev, > > > } > > > if (!subnet_match) > > > goto out; > > >- if (smcrv2->daddr && smc_ib_find_route(smcrv2->saddr, > > >+ if (smcrv2->daddr && smc_ib_find_route(net, smcrv2->saddr, > > > smcrv2->daddr, > > > smcrv2->nexthop_mac, > > > &smcrv2->uses_gateway)) > > >diff --git a/net/smc/smc_ib.h b/net/smc/smc_ib.h > > >index 4df5f8c8a0a1..ef8ac2b7546d 100644 > > >--- a/net/smc/smc_ib.h > > >+++ b/net/smc/smc_ib.h > > >@@ -112,7 +112,7 @@ void smc_ib_sync_sg_for_device(struct smc_link *lnk, > > > int smc_ib_determine_gid(struct smc_ib_device *smcibdev, u8 ibport, > > > unsigned short vlan_id, u8 gid[], u8 *sgid_index, > > > struct smc_init_info_smcrv2 *smcrv2); > > >-int smc_ib_find_route(__be32 saddr, __be32 daddr, > > >+int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > > > u8 nexthop_mac[], u8 *uses_gateway); > > > bool smc_ib_is_valid_local_systemid(void); > > > int smcr_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb); > > >-- > > >2.37.1 (Apple Git-137.1) > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-27 3:42 ` Dust Li 2023-09-27 5:55 ` Leon Romanovsky @ 2023-10-03 10:41 ` Paolo Abeni 2023-10-03 13:26 ` Dust Li 1 sibling, 1 reply; 18+ messages in thread From: Paolo Abeni @ 2023-10-03 10:41 UTC (permalink / raw) To: dust.li, Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher Cc: D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-s390, netdev, linux-kernel On Wed, 2023-09-27 at 11:42 +0800, Dust Li wrote: > On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: > > If the netdevice is within a container and communicates externally > > through network technologies like VXLAN, we won't be able to find > > routing information in the init_net namespace. To address this issue, > > Thanks for your founding ! > > I think this is a more generic problem, but not just related to VXLAN ? > If we use SMC-R v2 and the netdevice is in a net namespace which is not > init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. Re-stating the above to be on the same page: the patch should be re- posted targeting the net tree, and including a suitable fixes tag. @Dust Li: please correct me if I misread you. Thanks, Paolo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-10-03 10:41 ` Paolo Abeni @ 2023-10-03 13:26 ` Dust Li 0 siblings, 0 replies; 18+ messages in thread From: Dust Li @ 2023-10-03 13:26 UTC (permalink / raw) To: Paolo Abeni, Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher Cc: D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-s390, netdev, linux-kernel On Tue, Oct 03, 2023 at 12:41:25PM +0200, Paolo Abeni wrote: >On Wed, 2023-09-27 at 11:42 +0800, Dust Li wrote: >> On Mon, Sep 25, 2023 at 10:35:45AM +0800, Albert Huang wrote: >> > If the netdevice is within a container and communicates externally >> > through network technologies like VXLAN, we won't be able to find >> > routing information in the init_net namespace. To address this issue, >> >> Thanks for your founding ! >> >> I think this is a more generic problem, but not just related to VXLAN ? >> If we use SMC-R v2 and the netdevice is in a net namespace which is not >> init_net, we should always fail, right ? If so, I'd prefer this to be a bugfix. > >Re-stating the above to be on the same page: the patch should be re- >posted targeting the net tree, and including a suitable fixes tag. > >@Dust Li: please correct me if I misread you. Right, this is exactly what I mean. Best regards, Dust > >Thanks, > >Paolo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-25 2:35 [PATCH net-next] net/smc: add support for netdevice in containers Albert Huang 2023-09-26 10:48 ` Leon Romanovsky 2023-09-27 3:42 ` Dust Li @ 2023-09-28 15:04 ` Niklas Schnelle 2023-10-11 14:48 ` Dust Li 2 siblings, 1 reply; 18+ messages in thread From: Niklas Schnelle @ 2023-09-28 15:04 UTC (permalink / raw) To: Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher Cc: D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel On Mon, 2023-09-25 at 10:35 +0800, Albert Huang wrote: > If the netdevice is within a container and communicates externally > through network technologies like VXLAN, we won't be able to find > routing information in the init_net namespace. To address this issue, > we need to add a struct net parameter to the smc_ib_find_route function. > This allow us to locate the routing information within the corresponding > net namespace, ensuring the correct completion of the SMC CLC interaction. > > Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> > --- > net/smc/af_smc.c | 3 ++- > net/smc/smc_ib.c | 7 ++++--- > net/smc/smc_ib.h | 2 +- > 3 files changed, 7 insertions(+), 5 deletions(-) > I'm trying to test this patch on s390x but I'm running into the same issue I ran into with the original SMC namespace support:https://lore.kernel.org/netdev/8701fa4557026983a9ec687cfdd7ac5b3b85fd39.camel@linux.ibm.com/ Just like back then I'm using a server and a client network namespace on the same system with two ConnectX-4 VFs from the same card and port. Both TCP/IP traffic as well as user-space RDMA via "qperf … rc_bw" and `qperf … rc_lat` work between namespaces and definitely go via the card. I did use "rdma system set netns exclusive" then moved the RDMA devices into the namespaces with "rdma dev set <rdma_dev> netns <namespace>". I also verified with "ip netns exec <namespace> rdma dev" that the RDMA devices are in the network namespace and as seen by the qperf runs normal RDMA does work. For reference the smc_chck tool gives me the following output: Server started on port 37373 [DEBUG] Interfaces to check: eno4378 Test with target IP 10.10.93.12 and port 37373 Live test (SMC-D and SMC-R) [DEBUG] Running client: smc_run /tmp/echo-clt.x0q8iO 10.10.93.12 -p 37373 [DEBUG] Client result: TCP 0x05000000/0x03030000 Failed (TCP fallback), reasons: Client: 0x05000000 Peer declined during handshake Server: 0x03030000 No SMC devices found (R and D) I also checked that SMC is generally working, once I add an ISM device I do get SMC-D between the namespaces. Any ideas what could break SMC-R here? Thanks, Niklas ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-09-28 15:04 ` Niklas Schnelle @ 2023-10-11 14:48 ` Dust Li 2023-10-12 12:17 ` Dust Li 0 siblings, 1 reply; 18+ messages in thread From: Dust Li @ 2023-10-11 14:48 UTC (permalink / raw) To: Niklas Schnelle, Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher Cc: D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel On Thu, Sep 28, 2023 at 05:04:21PM +0200, Niklas Schnelle wrote: >On Mon, 2023-09-25 at 10:35 +0800, Albert Huang wrote: >> If the netdevice is within a container and communicates externally >> through network technologies like VXLAN, we won't be able to find >> routing information in the init_net namespace. To address this issue, >> we need to add a struct net parameter to the smc_ib_find_route function. >> This allow us to locate the routing information within the corresponding >> net namespace, ensuring the correct completion of the SMC CLC interaction. >> >> Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> >> --- >> net/smc/af_smc.c | 3 ++- >> net/smc/smc_ib.c | 7 ++++--- >> net/smc/smc_ib.h | 2 +- >> 3 files changed, 7 insertions(+), 5 deletions(-) >> > >I'm trying to test this patch on s390x but I'm running into the same >issue I ran into with the original SMC namespace >support:https://lore.kernel.org/netdev/8701fa4557026983a9ec687cfdd7ac5b3b85fd39.camel@linux.ibm.com/ > >Just like back then I'm using a server and a client network namespace >on the same system with two ConnectX-4 VFs from the same card and port. >Both TCP/IP traffic as well as user-space RDMA via "qperf … rc_bw" and >`qperf … rc_lat` work between namespaces and definitely go via the >card. > >I did use "rdma system set netns exclusive" then moved the RDMA devices >into the namespaces with "rdma dev set <rdma_dev> netns <namespace>". I >also verified with "ip netns exec <namespace> rdma dev" >that the RDMA devices are in the network namespace and as seen by the >qperf runs normal RDMA does work. > >For reference the smc_chck tool gives me the following output: > >Server started on port 37373 >[DEBUG] Interfaces to check: eno4378 >Test with target IP 10.10.93.12 and port 37373 > Live test (SMC-D and SMC-R) >[DEBUG] Running client: smc_run /tmp/echo-clt.x0q8iO 10.10.93.12 -p >37373 >[DEBUG] Client result: TCP 0x05000000/0x03030000 > Failed (TCP fallback), reasons: > Client: 0x05000000 Peer declined during handshake > Server: 0x03030000 No SMC devices found (R and D) > >I also checked that SMC is generally working, once I add an ISM device >I do get SMC-D between the namespaces. Any ideas what could break SMC-R >here? I missed the email :( Are you running SMC-Rv2 or v1 ? Best regards, Dust > >Thanks, >Niklas ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-10-11 14:48 ` Dust Li @ 2023-10-12 12:17 ` Dust Li 2023-10-12 19:23 ` Wenjia Zhang 2023-10-13 8:04 ` Niklas Schnelle 0 siblings, 2 replies; 18+ messages in thread From: Dust Li @ 2023-10-12 12:17 UTC (permalink / raw) To: Niklas Schnelle, Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher Cc: D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel On Wed, Oct 11, 2023 at 10:48:16PM +0800, Dust Li wrote: >On Thu, Sep 28, 2023 at 05:04:21PM +0200, Niklas Schnelle wrote: >>On Mon, 2023-09-25 at 10:35 +0800, Albert Huang wrote: >>> If the netdevice is within a container and communicates externally >>> through network technologies like VXLAN, we won't be able to find >>> routing information in the init_net namespace. To address this issue, >>> we need to add a struct net parameter to the smc_ib_find_route function. >>> This allow us to locate the routing information within the corresponding >>> net namespace, ensuring the correct completion of the SMC CLC interaction. >>> >>> Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> >>> --- >>> net/smc/af_smc.c | 3 ++- >>> net/smc/smc_ib.c | 7 ++++--- >>> net/smc/smc_ib.h | 2 +- >>> 3 files changed, 7 insertions(+), 5 deletions(-) >>> >> >>I'm trying to test this patch on s390x but I'm running into the same >>issue I ran into with the original SMC namespace >>support:https://lore.kernel.org/netdev/8701fa4557026983a9ec687cfdd7ac5b3b85fd39.camel@linux.ibm.com/ >> >>Just like back then I'm using a server and a client network namespace >>on the same system with two ConnectX-4 VFs from the same card and port. >>Both TCP/IP traffic as well as user-space RDMA via "qperf … rc_bw" and >>`qperf … rc_lat` work between namespaces and definitely go via the >>card. >> >>I did use "rdma system set netns exclusive" then moved the RDMA devices >>into the namespaces with "rdma dev set <rdma_dev> netns <namespace>". I >>also verified with "ip netns exec <namespace> rdma dev" >>that the RDMA devices are in the network namespace and as seen by the >>qperf runs normal RDMA does work. >> >>For reference the smc_chck tool gives me the following output: >> >>Server started on port 37373 >>[DEBUG] Interfaces to check: eno4378 >>Test with target IP 10.10.93.12 and port 37373 >> Live test (SMC-D and SMC-R) >>[DEBUG] Running client: smc_run /tmp/echo-clt.x0q8iO 10.10.93.12 -p >>37373 >>[DEBUG] Client result: TCP 0x05000000/0x03030000 >> Failed (TCP fallback), reasons: >> Client: 0x05000000 Peer declined during handshake >> Server: 0x03030000 No SMC devices found (R and D) >> >>I also checked that SMC is generally working, once I add an ISM device >>I do get SMC-D between the namespaces. Any ideas what could break SMC-R >>here? > >I missed the email :( > >Are you running SMC-Rv2 or v1 ? Hi Niklas, I tried your test today, and I encounter the same issue. But I found it's because my 2 VFs are in difference subnets, SMC-Rv2 work fine, SMC-Rv1 won't work, which is expected. When I set the 2 VFs in the same subnet, SMC-Rv1 also works. So I'm not sure it's the same for you. Can you check it out ? BTW, the fallback reason(SMC_CLC_DECL_NOSMCDEV) in this case is really not friendly, it's better to return SMC_CLC_DECL_DIFFPREFIX. Best regards, Dust > >Best regards, >Dust > > >> >>Thanks, >>Niklas ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-10-12 12:17 ` Dust Li @ 2023-10-12 19:23 ` Wenjia Zhang 2023-10-13 8:04 ` Niklas Schnelle 1 sibling, 0 replies; 18+ messages in thread From: Wenjia Zhang @ 2023-10-12 19:23 UTC (permalink / raw) To: dust.li, Niklas Schnelle, Albert Huang, Karsten Graul, Jan Karcher Cc: D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel On 12.10.23 14:17, Dust Li wrote: > On Wed, Oct 11, 2023 at 10:48:16PM +0800, Dust Li wrote: >> On Thu, Sep 28, 2023 at 05:04:21PM +0200, Niklas Schnelle wrote: >>> On Mon, 2023-09-25 at 10:35 +0800, Albert Huang wrote: >>>> If the netdevice is within a container and communicates externally >>>> through network technologies like VXLAN, we won't be able to find >>>> routing information in the init_net namespace. To address this issue, >>>> we need to add a struct net parameter to the smc_ib_find_route function. >>>> This allow us to locate the routing information within the corresponding >>>> net namespace, ensuring the correct completion of the SMC CLC interaction. >>>> >>>> Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> >>>> --- >>>> net/smc/af_smc.c | 3 ++- >>>> net/smc/smc_ib.c | 7 ++++--- >>>> net/smc/smc_ib.h | 2 +- >>>> 3 files changed, 7 insertions(+), 5 deletions(-) >>>> >>> >>> I'm trying to test this patch on s390x but I'm running into the same >>> issue I ran into with the original SMC namespace >>> support:https://lore.kernel.org/netdev/8701fa4557026983a9ec687cfdd7ac5b3b85fd39.camel@linux.ibm.com/ >>> >>> Just like back then I'm using a server and a client network namespace >>> on the same system with two ConnectX-4 VFs from the same card and port. >>> Both TCP/IP traffic as well as user-space RDMA via "qperf … rc_bw" and >>> `qperf … rc_lat` work between namespaces and definitely go via the >>> card. >>> >>> I did use "rdma system set netns exclusive" then moved the RDMA devices >>> into the namespaces with "rdma dev set <rdma_dev> netns <namespace>". I >>> also verified with "ip netns exec <namespace> rdma dev" >>> that the RDMA devices are in the network namespace and as seen by the >>> qperf runs normal RDMA does work. >>> >>> For reference the smc_chck tool gives me the following output: >>> >>> Server started on port 37373 >>> [DEBUG] Interfaces to check: eno4378 >>> Test with target IP 10.10.93.12 and port 37373 >>> Live test (SMC-D and SMC-R) >>> [DEBUG] Running client: smc_run /tmp/echo-clt.x0q8iO 10.10.93.12 -p >>> 37373 >>> [DEBUG] Client result: TCP 0x05000000/0x03030000 >>> Failed (TCP fallback), reasons: >>> Client: 0x05000000 Peer declined during handshake >>> Server: 0x03030000 No SMC devices found (R and D) >>> >>> I also checked that SMC is generally working, once I add an ISM device >>> I do get SMC-D between the namespaces. Any ideas what could break SMC-R >>> here? >> >> I missed the email :( >> >> Are you running SMC-Rv2 or v1 ? > > Hi Niklas, > > I tried your test today, and I encounter the same issue. > But I found it's because my 2 VFs are in difference subnets, > SMC-Rv2 work fine, SMC-Rv1 won't work, which is expected. > When I set the 2 VFs in the same subnet, SMC-Rv1 also works. > > So I'm not sure it's the same for you. Can you check it out ? > > BTW, the fallback reason(SMC_CLC_DECL_NOSMCDEV) in this case > is really not friendly, it's better to return SMC_CLC_DECL_DIFFPREFIX. > > Best regards, > Dust > Thank you, Dust, for trying it out! The reason code SMC_CLC_DECL_NOSMCDEV there could really make one misunderstand. > >> >> Best regards, >> Dust >> >> >>> >>> Thanks, >>> Niklas ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next] net/smc: add support for netdevice in containers. 2023-10-12 12:17 ` Dust Li 2023-10-12 19:23 ` Wenjia Zhang @ 2023-10-13 8:04 ` Niklas Schnelle 1 sibling, 0 replies; 18+ messages in thread From: Niklas Schnelle @ 2023-10-13 8:04 UTC (permalink / raw) To: dust.li, Albert Huang, Karsten Graul, Wenjia Zhang, Jan Karcher Cc: D. Wythe, Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-s390, netdev, linux-kernel On Thu, 2023-10-12 at 20:17 +0800, Dust Li wrote: > On Wed, Oct 11, 2023 at 10:48:16PM +0800, Dust Li wrote: > > On Thu, Sep 28, 2023 at 05:04:21PM +0200, Niklas Schnelle wrote: > > > On Mon, 2023-09-25 at 10:35 +0800, Albert Huang wrote: > > > > If the netdevice is within a container and communicates externally > > > > through network technologies like VXLAN, we won't be able to find > > > > routing information in the init_net namespace. To address this issue, > > > > we need to add a struct net parameter to the smc_ib_find_route function. > > > > This allow us to locate the routing information within the corresponding > > > > net namespace, ensuring the correct completion of the SMC CLC interaction. > > > > > > > > Signed-off-by: Albert Huang <huangjie.albert@bytedance.com> > > > > --- > > > > net/smc/af_smc.c | 3 ++- > > > > net/smc/smc_ib.c | 7 ++++--- > > > > net/smc/smc_ib.h | 2 +- > > > > 3 files changed, 7 insertions(+), 5 deletions(-) > > > > > > > > > > I'm trying to test this patch on s390x but I'm running into the same > > > issue I ran into with the original SMC namespace > > > support:https://lore.kernel.org/netdev/8701fa4557026983a9ec687cfdd7ac5b3b85fd39.camel@linux.ibm.com/ > > > > > > Just like back then I'm using a server and a client network namespace > > > on the same system with two ConnectX-4 VFs from the same card and port. > > > Both TCP/IP traffic as well as user-space RDMA via "qperf … rc_bw" and > > > `qperf … rc_lat` work between namespaces and definitely go via the > > > card. > > > > > > I did use "rdma system set netns exclusive" then moved the RDMA devices > > > into the namespaces with "rdma dev set <rdma_dev> netns <namespace>". I > > > also verified with "ip netns exec <namespace> rdma dev" > > > that the RDMA devices are in the network namespace and as seen by the > > > qperf runs normal RDMA does work. > > > > > > For reference the smc_chck tool gives me the following output: > > > > > > Server started on port 37373 > > > [DEBUG] Interfaces to check: eno4378 > > > Test with target IP 10.10.93.12 and port 37373 > > > Live test (SMC-D and SMC-R) > > > [DEBUG] Running client: smc_run /tmp/echo-clt.x0q8iO 10.10.93.12 -p > > > 37373 > > > [DEBUG] Client result: TCP 0x05000000/0x03030000 > > > Failed (TCP fallback), reasons: > > > Client: 0x05000000 Peer declined during handshake > > > Server: 0x03030000 No SMC devices found (R and D) > > > > > > I also checked that SMC is generally working, once I add an ISM device > > > I do get SMC-D between the namespaces. Any ideas what could break SMC-R > > > here? > > > > I missed the email :( > > > > Are you running SMC-Rv2 or v1 ? > > Hi Niklas, > > I tried your test today, and I encounter the same issue. > But I found it's because my 2 VFs are in difference subnets, > SMC-Rv2 work fine, SMC-Rv1 won't work, which is expected. > When I set the 2 VFs in the same subnet, SMC-Rv1 also works. > > So I'm not sure it's the same for you. Can you check it out ? > > BTW, the fallback reason(SMC_CLC_DECL_NOSMCDEV) in this case > is really not friendly, it's better to return SMC_CLC_DECL_DIFFPREFIX. > > Best regards, > Dust I think you are right. I did use two consecutive private IPs but I had set the subnet mask to /32. Setting that to /16 the SMC-R connection is established. I'll work with Wenjia and Jan on why my system is defaulting to SMC-Rv1 I would have hoped to get SMC-Rv2. Thanks for your insights! Niklas ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2023-10-13 8:05 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-25 2:35 [PATCH net-next] net/smc: add support for netdevice in containers Albert Huang 2023-09-26 10:48 ` Leon Romanovsky 2023-09-26 11:14 ` Alexandra Winter 2023-09-26 11:41 ` Leon Romanovsky 2023-09-26 12:09 ` Dust Li 2023-09-26 17:30 ` Leon Romanovsky 2023-09-27 3:42 ` Dust Li 2023-09-27 5:55 ` Leon Romanovsky 2023-09-27 12:17 ` Dust Li 2023-09-28 9:51 ` Leon Romanovsky 2023-09-28 3:11 ` [External] " 黄杰 2023-10-03 10:41 ` Paolo Abeni 2023-10-03 13:26 ` Dust Li 2023-09-28 15:04 ` Niklas Schnelle 2023-10-11 14:48 ` Dust Li 2023-10-12 12:17 ` Dust Li 2023-10-12 19:23 ` Wenjia Zhang 2023-10-13 8:04 ` Niklas Schnelle
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).