* Re: [PATCHv3 0/8] Fix the problem that rxe can not work in net namespace
[not found] <20230214060634.427162-1-yanjun.zhu@intel.com>
@ 2023-02-23 0:31 ` Zhu Yanjun
2023-02-23 4:56 ` Jakub Kicinski
2023-02-25 8:43 ` Rain River
[not found] ` <20230214060634.427162-2-yanjun.zhu@intel.com>
` (7 subsequent siblings)
8 siblings, 2 replies; 12+ messages in thread
From: Zhu Yanjun @ 2023-02-23 0:31 UTC (permalink / raw)
To: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, parav,
netdev@vger.kernel.org
Cc: Zhu Yanjun
在 2023/2/14 14:06, Zhu Yanjun 写道:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> When run "ip link add" command to add a rxe rdma link in a net
> namespace, normally this rxe rdma link can not work in a net
> name space.
>
> The root cause is that a sock listening on udp port 4791 is created
> in init_net when the rdma_rxe module is loaded into kernel. That is,
> the sock listening on udp port 4791 is created in init_net. Other net
> namespace is difficult to use this sock.
>
> The following commits will solve this problem.
>
> In the first commit, move the creating sock listening on udp port 4791
> from module_init function to rdma link creating functions. That is,
> after the module rdma_rxe is loaded, the sock will not be created.
> When run "rdma link add ..." command, the sock will be created. So
> when creating a rdma link in the net namespace, the sock will be
> created in this net namespace.
>
> In the second commit, the functions udp4_lib_lookup and udp6_lib_lookup
> will check the sock exists in the net namespace or not. If yes, rdma
> link will increase the reference count of this sock, then continue other
> jobs instead of creating a new sock to listen on udp port 4791. Since the
> network notifier is global, when the module rdma_rxe is loaded, this
> notifier will be registered.
>
> After the rdma link is created, the command "rdma link del" is to
> delete rdma link at the same time the sock is checked. If the reference
> count of this sock is greater than the sock reference count needed by
> udp tunnel, the sock reference count is decreased by one. If equal, it
> indicates that this rdma link is the last one. As such, the udp tunnel
> is shut down and the sock is closed. The above work should be
> implemented in linkdel function. But currently no dellink function in
> rxe. So the 3rd commit addes dellink function pointer. And the 4th
> commit implements the dellink function in rxe.
>
> To now, it is not necessary to keep a global variable to store the sock
> listening udp port 4791. This global variable can be replaced by the
> functions udp4_lib_lookup and udp6_lib_lookup totally. Because the
> function udp6_lib_lookup is in the fast path, a member variable l_sk6
> is added to store the sock. If l_sk6 is NULL, udp6_lib_lookup is called
> to lookup the sock, then the sock is stored in l_sk6, in the future,it
> can be used directly.
>
> All the above work has been done in init_net. And it can also work in
> the net namespace. So the init_net is replaced by the individual net
> namespace. This is what the 6th commit does. Because rxe device is
> dependent on the net device and the sock listening on udp port 4791,
> every rxe device is in exclusive mode in the individual net namespace.
> Other rdma netns operations will be considerred in the future.
>
> In the 7th commit, the register_pernet_subsys/unregister_pernet_subsys
> functions are added. When a new net namespace is created, the init
> function will initialize the sk4 and sk6 socks. Then the 2 socks will
> be released when the net namespace is destroyed. The functions
> rxe_ns_pernet_sk4/rxe_ns_pernet_set_sk4 will get and set sk4 in the net
> namespace. The functions rxe_ns_pernet_sk6/rxe_ns_pernet_set_sk6 will
> handle sk6. Then sk4 and sk6 are used in the previous commits.
>
> As the sk4 and sk6 in pernet namespace can be accessed, it is not
> necessary to add a new l_sk6. As such, in the 8th commit, the l_sk6 is
> replaced with the sk6 in pernet namespace.
>
> Test steps:
> 1) Suppose that 2 NICs are in 2 different net namespaces.
>
> # ip netns exec net0 ip link
> 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> link/ether 00:1e:67:a0:22:3f brd ff:ff:ff:ff:ff:ff
> altname enp5s0
>
> # ip netns exec net1 ip link
> 4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
> link/ether f8:e4:3b:3b:e4:10 brd ff:ff:ff:ff:ff:ff
>
> 2) Add rdma link in the different net namespace
> net0:
> # ip netns exec net0 rdma link add rxe0 type rxe netdev eno2
>
> net1:
> # ip netns exec net1 rdma link add rxe1 type rxe netdev eno3
>
> 3) Run rping test.
> net0
> # ip netns exec net0 rping -s -a 192.168.2.1 -C 1&
> [1] 1737
> # ip netns exec net1 rping -c -a 192.168.2.1 -d -v -C 1
> verbose
> count 1
> ...
> ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
> ...
>
> 4) Remove the rdma links from the net namespaces.
> net0:
> # ip netns exec net0 ss -lu
> State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
> UNCONN 0 0 0.0.0.0:4791 0.0.0.0:*
> UNCONN 0 0 [::]:4791 [::]:*
>
> # ip netns exec net0 rdma link del rxe0
>
> # ip netns exec net0 ss -lu
> State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
>
> net1:
> # ip netns exec net0 ss -lu
> State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
> UNCONN 0 0 0.0.0.0:4791 0.0.0.0:*
> UNCONN 0 0 [::]:4791 [::]:*
>
> # ip netns exec net1 rdma link del rxe1
>
> # ip netns exec net0 ss -lu
> State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
>
> V2->V3: 1) Add "rdma link del" example in the cover letter, and use "ss -lu" to
> verify rdma link is removed.
> 2) Add register_pernet_subsys/unregister_pernet_subsys net namespace
> 3) Replace l_sk6 with sk6 of pernet_name_space
>
> V1->V2: Add the explicit initialization of sk6.
Add netdev@vger.kernel.org.
Zhu Yanjun
>
> Zhu Yanjun (8):
> RDMA/rxe: Creating listening sock in newlink function
> RDMA/rxe: Support more rdma links in init_net
> RDMA/nldev: Add dellink function pointer
> RDMA/rxe: Implement dellink in rxe
> RDMA/rxe: Replace global variable with sock lookup functions
> RDMA/rxe: add the support of net namespace
> RDMA/rxe: Add the support of net namespace notifier
> RDMA/rxe: Replace l_sk6 with sk6 in net namespace
>
> drivers/infiniband/core/nldev.c | 6 ++
> drivers/infiniband/sw/rxe/Makefile | 3 +-
> drivers/infiniband/sw/rxe/rxe.c | 35 +++++++-
> drivers/infiniband/sw/rxe/rxe_net.c | 113 +++++++++++++++++-------
> drivers/infiniband/sw/rxe/rxe_net.h | 9 +-
> drivers/infiniband/sw/rxe/rxe_ns.c | 128 ++++++++++++++++++++++++++++
> drivers/infiniband/sw/rxe/rxe_ns.h | 11 +++
> include/rdma/rdma_netlink.h | 2 +
> 8 files changed, 267 insertions(+), 40 deletions(-)
> create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.c
> create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.h
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv3 0/8] Fix the problem that rxe can not work in net namespace
2023-02-23 0:31 ` [PATCHv3 0/8] Fix the problem that rxe can not work in net namespace Zhu Yanjun
@ 2023-02-23 4:56 ` Jakub Kicinski
2023-02-23 11:42 ` Zhu Yanjun
2023-02-25 8:43 ` Rain River
1 sibling, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2023-02-23 4:56 UTC (permalink / raw)
To: Zhu Yanjun
Cc: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, parav,
netdev@vger.kernel.org
On Thu, 23 Feb 2023 08:31:49 +0800 Zhu Yanjun wrote:
> > V1->V2: Add the explicit initialization of sk6.
>
> Add netdev@vger.kernel.org.
On the commit letter? Thanks, but that's not how it works.
Repost the patches if you want us to see them.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv3 0/8] Fix the problem that rxe can not work in net namespace
2023-02-23 4:56 ` Jakub Kicinski
@ 2023-02-23 11:42 ` Zhu Yanjun
0 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2023-02-23 11:42 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, parav,
netdev@vger.kernel.org
在 2023/2/23 12:56, Jakub Kicinski 写道:
> On Thu, 23 Feb 2023 08:31:49 +0800 Zhu Yanjun wrote:
>>> V1->V2: Add the explicit initialization of sk6.
>> Add netdev@vger.kernel.org.
> On the commit letter? Thanks, but that's not how it works.
> Repost the patches if you want us to see them.
Got it. I will resend all the commits.
Zhu Yanjun
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv3 1/8] RDMA/rxe: Creating listening sock in newlink function
[not found] ` <20230214060634.427162-2-yanjun.zhu@intel.com>
@ 2023-02-23 13:10 ` Zhu Yanjun
0 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2023-02-23 13:10 UTC (permalink / raw)
To: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, parav,
netdev@vger.kernel.org
Cc: Zhu Yanjun
在 2023/2/14 14:06, Zhu Yanjun 写道:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> Originally when the module rdma_rxe is loaded, the sock listening on udp
> port 4791 is created. Currently moving the creating listening port to
> newlink function.
>
> So when running "rdma link add" command, the sock listening on udp port
> 4791 is created.
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Add netdev@vger.kernel.org.
Zhu Yanjun
> ---
> drivers/infiniband/sw/rxe/rxe.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index 136c2efe3466..64644cb0bb38 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -192,6 +192,10 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
> goto err;
> }
>
> + err = rxe_net_init();
> + if (err)
> + return err;
> +
> err = rxe_net_add(ibdev_name, ndev);
> if (err) {
> rxe_dbg(exists, "failed to add %s\n", ndev->name);
> @@ -208,12 +212,6 @@ static struct rdma_link_ops rxe_link_ops = {
>
> static int __init rxe_module_init(void)
> {
> - int err;
> -
> - err = rxe_net_init();
> - if (err)
> - return err;
> -
> rdma_link_register(&rxe_link_ops);
> pr_info("loaded\n");
> return 0;
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv3 2/8] RDMA/rxe: Support more rdma links in init_net
[not found] ` <20230214060634.427162-3-yanjun.zhu@intel.com>
@ 2023-02-23 13:10 ` Zhu Yanjun
0 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2023-02-23 13:10 UTC (permalink / raw)
To: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, parav,
netdev@vger.kernel.org
Cc: Zhu Yanjun
在 2023/2/14 14:06, Zhu Yanjun 写道:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> In init_net, when several rdma links are created with the command "rdma
> link add", newlink will check whether the udp port 4791 is listening or
> not.
> If not, creating a sock listening on udp port 4791. If yes, increasing the
> reference count of the sock.
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Add netdev@vger.kernel.org.
Zhu Yanjun
> ---
> drivers/infiniband/sw/rxe/rxe.c | 12 ++++++-
> drivers/infiniband/sw/rxe/rxe_net.c | 55 +++++++++++++++++++++--------
> drivers/infiniband/sw/rxe/rxe_net.h | 1 +
> 3 files changed, 52 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index 64644cb0bb38..0ce6adb43cfc 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -8,6 +8,7 @@
> #include <net/addrconf.h>
> #include "rxe.h"
> #include "rxe_loc.h"
> +#include "rxe_net.h"
>
> MODULE_AUTHOR("Bob Pearson, Frank Zago, John Groves, Kamal Heib");
> MODULE_DESCRIPTION("Soft RDMA transport");
> @@ -205,14 +206,23 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
> return err;
> }
>
> -static struct rdma_link_ops rxe_link_ops = {
> +struct rdma_link_ops rxe_link_ops = {
> .type = "rxe",
> .newlink = rxe_newlink,
> };
>
> static int __init rxe_module_init(void)
> {
> + int err;
> +
> rdma_link_register(&rxe_link_ops);
> +
> + err = rxe_register_notifier();
> + if (err) {
> + pr_err("Failed to register netdev notifier\n");
> + return -1;
> + }
> +
> pr_info("loaded\n");
> return 0;
> }
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index e02e1624bcf4..3ca92e062800 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -623,13 +623,23 @@ static struct notifier_block rxe_net_notifier = {
>
> static int rxe_net_ipv4_init(void)
> {
> - recv_sockets.sk4 = rxe_setup_udp_tunnel(&init_net,
> - htons(ROCE_V2_UDP_DPORT), false);
> - if (IS_ERR(recv_sockets.sk4)) {
> - recv_sockets.sk4 = NULL;
> + struct sock *sk;
> + struct socket *sock;
> +
> + rcu_read_lock();
> + sk = udp4_lib_lookup(&init_net, 0, 0, htonl(INADDR_ANY),
> + htons(ROCE_V2_UDP_DPORT), 0);
> + rcu_read_unlock();
> + if (sk)
> + return 0;
> +
> + sock = rxe_setup_udp_tunnel(&init_net, htons(ROCE_V2_UDP_DPORT), false);
> + if (IS_ERR(sock)) {
> pr_err("Failed to create IPv4 UDP tunnel\n");
> + recv_sockets.sk4 = NULL;
> return -1;
> }
> + recv_sockets.sk4 = sock;
>
> return 0;
> }
> @@ -637,24 +647,46 @@ static int rxe_net_ipv4_init(void)
> static int rxe_net_ipv6_init(void)
> {
> #if IS_ENABLED(CONFIG_IPV6)
> + struct sock *sk;
> + struct socket *sock;
> +
> + rcu_read_lock();
> + sk = udp6_lib_lookup(&init_net, NULL, 0, &in6addr_any,
> + htons(ROCE_V2_UDP_DPORT), 0);
> + rcu_read_unlock();
> + if (sk)
> + return 0;
>
> - recv_sockets.sk6 = rxe_setup_udp_tunnel(&init_net,
> - htons(ROCE_V2_UDP_DPORT), true);
> - if (PTR_ERR(recv_sockets.sk6) == -EAFNOSUPPORT) {
> + sock = rxe_setup_udp_tunnel(&init_net, htons(ROCE_V2_UDP_DPORT), true);
> + if (PTR_ERR(sock) == -EAFNOSUPPORT) {
> recv_sockets.sk6 = NULL;
> pr_warn("IPv6 is not supported, can not create a UDPv6 socket\n");
> return 0;
> }
>
> - if (IS_ERR(recv_sockets.sk6)) {
> + if (IS_ERR(sock)) {
> recv_sockets.sk6 = NULL;
> pr_err("Failed to create IPv6 UDP tunnel\n");
> return -1;
> }
> + recv_sockets.sk6 = sock;
> #endif
> return 0;
> }
>
> +int rxe_register_notifier(void)
> +{
> + int err;
> +
> + err = register_netdevice_notifier(&rxe_net_notifier);
> + if (err) {
> + pr_err("Failed to register netdev notifier\n");
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> void rxe_net_exit(void)
> {
> rxe_release_udp_tunnel(recv_sockets.sk6);
> @@ -666,19 +698,12 @@ int rxe_net_init(void)
> {
> int err;
>
> - recv_sockets.sk6 = NULL;
> -
> err = rxe_net_ipv4_init();
> if (err)
> return err;
> err = rxe_net_ipv6_init();
> if (err)
> goto err_out;
> - err = register_netdevice_notifier(&rxe_net_notifier);
> - if (err) {
> - pr_err("Failed to register netdev notifier\n");
> - goto err_out;
> - }
> return 0;
> err_out:
> rxe_net_exit();
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h
> index 45d80d00f86b..a222c3eeae12 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.h
> +++ b/drivers/infiniband/sw/rxe/rxe_net.h
> @@ -18,6 +18,7 @@ struct rxe_recv_sockets {
>
> int rxe_net_add(const char *ibdev_name, struct net_device *ndev);
>
> +int rxe_register_notifier(void);
> int rxe_net_init(void);
> void rxe_net_exit(void);
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv3 3/8] RDMA/nldev: Add dellink function pointer
[not found] ` <20230214060634.427162-4-yanjun.zhu@intel.com>
@ 2023-02-23 13:11 ` Zhu Yanjun
0 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2023-02-23 13:11 UTC (permalink / raw)
To: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, parav,
netdev@vger.kernel.org
Cc: Zhu Yanjun
在 2023/2/14 14:06, Zhu Yanjun 写道:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> The newlink function pointer is added. And the sock listening on port 4791
> is added in the newlink function. So the dellink function is needed to
> remove the sock.
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Add netdev@vger.kernel.org.
Zhu Yanjun
> ---
> drivers/infiniband/core/nldev.c | 6 ++++++
> include/rdma/rdma_netlink.h | 2 ++
> 2 files changed, 8 insertions(+)
>
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index d5d3e4f0de77..97a62685ed5b 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -1758,6 +1758,12 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
> return -EINVAL;
> }
>
> + if (device->link_ops) {
> + err = device->link_ops->dellink(device);
> + if (err)
> + return err;
> + }
> +
> ib_unregister_device_and_put(device);
> return 0;
> }
> diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
> index c2a79aeee113..bf9df004061f 100644
> --- a/include/rdma/rdma_netlink.h
> +++ b/include/rdma/rdma_netlink.h
> @@ -5,6 +5,7 @@
>
> #include <linux/netlink.h>
> #include <uapi/rdma/rdma_netlink.h>
> +#include <rdma/ib_verbs.h>
>
> enum {
> RDMA_NLDEV_ATTR_EMPTY_STRING = 1,
> @@ -114,6 +115,7 @@ struct rdma_link_ops {
> struct list_head list;
> const char *type;
> int (*newlink)(const char *ibdev_name, struct net_device *ndev);
> + int (*dellink)(struct ib_device *dev);
> };
>
> void rdma_link_register(struct rdma_link_ops *ops);
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv3 4/8] RDMA/rxe: Implement dellink in rxe
[not found] ` <20230214060634.427162-5-yanjun.zhu@intel.com>
@ 2023-02-23 13:12 ` Zhu Yanjun
0 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2023-02-23 13:12 UTC (permalink / raw)
To: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, parav,
netdev@vger.kernel.org
Cc: Zhu Yanjun
在 2023/2/14 14:06, Zhu Yanjun 写道:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> When running "rdma link del" command, dellink function will be called.
> If the sock refcnt is greater than the refcnt needed for udp tunnel,
> the sock refcnt will be decreased by 1.
>
> If equal, the last rdma link is removed. The udp tunnel will be
> destroyed.
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Add netdev@vger.kernel.org.
Zhu Yanjun
> ---
> drivers/infiniband/sw/rxe/rxe.c | 12 +++++++++++-
> drivers/infiniband/sw/rxe/rxe_net.c | 17 +++++++++++++++--
> drivers/infiniband/sw/rxe/rxe_net.h | 1 +
> 3 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index 0ce6adb43cfc..ebfabc6d6b76 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -166,10 +166,12 @@ void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
> /* called by ifc layer to create new rxe device.
> * The caller should allocate memory for rxe by calling ib_alloc_device.
> */
> +static struct rdma_link_ops rxe_link_ops;
> int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name)
> {
> rxe_init(rxe);
> rxe_set_mtu(rxe, mtu);
> + rxe->ib_dev.link_ops = &rxe_link_ops;
>
> return rxe_register_device(rxe, ibdev_name);
> }
> @@ -206,9 +208,17 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
> return err;
> }
>
> -struct rdma_link_ops rxe_link_ops = {
> +static int rxe_dellink(struct ib_device *dev)
> +{
> + rxe_net_del(dev);
> +
> + return 0;
> +}
> +
> +static struct rdma_link_ops rxe_link_ops = {
> .type = "rxe",
> .newlink = rxe_newlink,
> + .dellink = rxe_dellink,
> };
>
> static int __init rxe_module_init(void)
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 3ca92e062800..4cc7de7b115b 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -530,6 +530,21 @@ int rxe_net_add(const char *ibdev_name, struct net_device *ndev)
> return 0;
> }
>
> +#define SK_REF_FOR_TUNNEL 2
> +void rxe_net_del(struct ib_device *dev)
> +{
> + if (refcount_read(&recv_sockets.sk6->sk->sk_refcnt) > SK_REF_FOR_TUNNEL)
> + __sock_put(recv_sockets.sk6->sk);
> + else
> + rxe_release_udp_tunnel(recv_sockets.sk6);
> +
> + if (refcount_read(&recv_sockets.sk4->sk->sk_refcnt) > SK_REF_FOR_TUNNEL)
> + __sock_put(recv_sockets.sk4->sk);
> + else
> + rxe_release_udp_tunnel(recv_sockets.sk4);
> +}
> +#undef SK_REF_FOR_TUNNEL
> +
> static void rxe_port_event(struct rxe_dev *rxe,
> enum ib_event_type event)
> {
> @@ -689,8 +704,6 @@ int rxe_register_notifier(void)
>
> void rxe_net_exit(void)
> {
> - rxe_release_udp_tunnel(recv_sockets.sk6);
> - rxe_release_udp_tunnel(recv_sockets.sk4);
> unregister_netdevice_notifier(&rxe_net_notifier);
> }
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h
> index a222c3eeae12..f48f22f3353b 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.h
> +++ b/drivers/infiniband/sw/rxe/rxe_net.h
> @@ -17,6 +17,7 @@ struct rxe_recv_sockets {
> };
>
> int rxe_net_add(const char *ibdev_name, struct net_device *ndev);
> +void rxe_net_del(struct ib_device *dev);
>
> int rxe_register_notifier(void);
> int rxe_net_init(void);
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv3 5/8] RDMA/rxe: Replace global variable with sock lookup functions
[not found] ` <20230214060634.427162-6-yanjun.zhu@intel.com>
@ 2023-02-23 13:13 ` Zhu Yanjun
0 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2023-02-23 13:13 UTC (permalink / raw)
To: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, parav,
netdev@vger.kernel.org
Cc: Zhu Yanjun
在 2023/2/14 14:06, Zhu Yanjun 写道:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> Originally a global variable is to keep the sock of udp listening
> on port 4791. In fact, sock lookup functions can be used to get
> the sock.
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Add netdev@vger.kernel.org.
Zhu Yanjun
> ---
> drivers/infiniband/sw/rxe/rxe.c | 1 +
> drivers/infiniband/sw/rxe/rxe_net.c | 58 ++++++++++++++++++++-------
> drivers/infiniband/sw/rxe/rxe_net.h | 5 ---
> drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
> 4 files changed, 45 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index ebfabc6d6b76..e81c2164d77f 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -74,6 +74,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe)
> rxe->ndev->dev_addr);
>
> rxe->max_ucontext = RXE_MAX_UCONTEXT;
> + rxe->l_sk6 = NULL;
> }
>
> /* initialize port attributes */
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 4cc7de7b115b..b56e2c32fbf7 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -18,8 +18,6 @@
> #include "rxe_net.h"
> #include "rxe_loc.h"
>
> -static struct rxe_recv_sockets recv_sockets;
> -
> static struct dst_entry *rxe_find_route4(struct rxe_qp *qp,
> struct net_device *ndev,
> struct in_addr *saddr,
> @@ -51,6 +49,23 @@ static struct dst_entry *rxe_find_route6(struct rxe_qp *qp,
> {
> struct dst_entry *ndst;
> struct flowi6 fl6 = { { 0 } };
> + struct rxe_dev *rdev;
> +
> + rdev = rxe_get_dev_from_net(ndev);
> + if (!rdev->l_sk6) {
> + struct sock *sk;
> +
> + rcu_read_lock();
> + sk = udp6_lib_lookup(&init_net, NULL, 0, &in6addr_any, htons(ROCE_V2_UDP_DPORT), 0);
> + rcu_read_unlock();
> + if (!sk) {
> + pr_info("file: %s +%d, error\n", __FILE__, __LINE__);
> + return (struct dst_entry *)sk;
> + }
> + __sock_put(sk);
> + rdev->l_sk6 = sk->sk_socket;
> + }
> +
>
> memset(&fl6, 0, sizeof(fl6));
> fl6.flowi6_oif = ndev->ifindex;
> @@ -58,8 +73,8 @@ static struct dst_entry *rxe_find_route6(struct rxe_qp *qp,
> memcpy(&fl6.daddr, daddr, sizeof(*daddr));
> fl6.flowi6_proto = IPPROTO_UDP;
>
> - ndst = ipv6_stub->ipv6_dst_lookup_flow(sock_net(recv_sockets.sk6->sk),
> - recv_sockets.sk6->sk, &fl6,
> + ndst = ipv6_stub->ipv6_dst_lookup_flow(dev_net(ndev),
> + rdev->l_sk6->sk, &fl6,
> NULL);
> if (IS_ERR(ndst)) {
> rxe_dbg_qp(qp, "no route to %pI6\n", daddr);
> @@ -533,15 +548,33 @@ int rxe_net_add(const char *ibdev_name, struct net_device *ndev)
> #define SK_REF_FOR_TUNNEL 2
> void rxe_net_del(struct ib_device *dev)
> {
> - if (refcount_read(&recv_sockets.sk6->sk->sk_refcnt) > SK_REF_FOR_TUNNEL)
> - __sock_put(recv_sockets.sk6->sk);
> + struct sock *sk;
> +
> + rcu_read_lock();
> + sk = udp4_lib_lookup(&init_net, 0, 0, htonl(INADDR_ANY), htons(ROCE_V2_UDP_DPORT), 0);
> + rcu_read_unlock();
> + if (!sk)
> + return;
> +
> + __sock_put(sk);
> +
> + if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL)
> + __sock_put(sk);
> else
> - rxe_release_udp_tunnel(recv_sockets.sk6);
> + rxe_release_udp_tunnel(sk->sk_socket);
> +
> + rcu_read_lock();
> + sk = udp6_lib_lookup(&init_net, NULL, 0, &in6addr_any, htons(ROCE_V2_UDP_DPORT), 0);
> + rcu_read_unlock();
> + if (!sk)
> + return;
> +
> + __sock_put(sk);
>
> - if (refcount_read(&recv_sockets.sk4->sk->sk_refcnt) > SK_REF_FOR_TUNNEL)
> - __sock_put(recv_sockets.sk4->sk);
> + if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL)
> + __sock_put(sk);
> else
> - rxe_release_udp_tunnel(recv_sockets.sk4);
> + rxe_release_udp_tunnel(sk->sk_socket);
> }
> #undef SK_REF_FOR_TUNNEL
>
> @@ -651,10 +684,8 @@ static int rxe_net_ipv4_init(void)
> sock = rxe_setup_udp_tunnel(&init_net, htons(ROCE_V2_UDP_DPORT), false);
> if (IS_ERR(sock)) {
> pr_err("Failed to create IPv4 UDP tunnel\n");
> - recv_sockets.sk4 = NULL;
> return -1;
> }
> - recv_sockets.sk4 = sock;
>
> return 0;
> }
> @@ -674,17 +705,14 @@ static int rxe_net_ipv6_init(void)
>
> sock = rxe_setup_udp_tunnel(&init_net, htons(ROCE_V2_UDP_DPORT), true);
> if (PTR_ERR(sock) == -EAFNOSUPPORT) {
> - recv_sockets.sk6 = NULL;
> pr_warn("IPv6 is not supported, can not create a UDPv6 socket\n");
> return 0;
> }
>
> if (IS_ERR(sock)) {
> - recv_sockets.sk6 = NULL;
> pr_err("Failed to create IPv6 UDP tunnel\n");
> return -1;
> }
> - recv_sockets.sk6 = sock;
> #endif
> return 0;
> }
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h
> index f48f22f3353b..027b20e1bab6 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.h
> +++ b/drivers/infiniband/sw/rxe/rxe_net.h
> @@ -11,11 +11,6 @@
> #include <net/if_inet6.h>
> #include <linux/module.h>
>
> -struct rxe_recv_sockets {
> - struct socket *sk4;
> - struct socket *sk6;
> -};
> -
> int rxe_net_add(const char *ibdev_name, struct net_device *ndev);
> void rxe_net_del(struct ib_device *dev);
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index 19ddfa890480..52c4ef4d0305 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -408,6 +408,7 @@ struct rxe_dev {
>
> struct rxe_port port;
> struct crypto_shash *tfm;
> + struct socket *l_sk6;
> };
>
> static inline void rxe_counter_inc(struct rxe_dev *rxe, enum rxe_counters index)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv3 6/8] RDMA/rxe: add the support of net namespace
[not found] ` <20230214060634.427162-7-yanjun.zhu@intel.com>
@ 2023-02-23 13:14 ` Zhu Yanjun
0 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2023-02-23 13:14 UTC (permalink / raw)
To: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, parav,
netdev@vger.kernel.org
Cc: Zhu Yanjun
在 2023/2/14 14:06, Zhu Yanjun 写道:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> Originally init_net is used to indicate the current net namespace.
> Currently more net namespaces are supported.
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Add netdev@vger.kernel.org.
Zhu Yanjun
> ---
> drivers/infiniband/sw/rxe/rxe.c | 2 +-
> drivers/infiniband/sw/rxe/rxe_net.c | 33 +++++++++++++++++------------
> drivers/infiniband/sw/rxe/rxe_net.h | 2 +-
> 3 files changed, 22 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index e81c2164d77f..4a17e4a003f5 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -196,7 +196,7 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev)
> goto err;
> }
>
> - err = rxe_net_init();
> + err = rxe_net_init(ndev);
> if (err)
> return err;
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index b56e2c32fbf7..9af90587642a 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -32,7 +32,7 @@ static struct dst_entry *rxe_find_route4(struct rxe_qp *qp,
> memcpy(&fl.daddr, daddr, sizeof(*daddr));
> fl.flowi4_proto = IPPROTO_UDP;
>
> - rt = ip_route_output_key(&init_net, &fl);
> + rt = ip_route_output_key(dev_net(ndev), &fl);
> if (IS_ERR(rt)) {
> rxe_dbg_qp(qp, "no route to %pI4\n", &daddr->s_addr);
> return NULL;
> @@ -56,7 +56,8 @@ static struct dst_entry *rxe_find_route6(struct rxe_qp *qp,
> struct sock *sk;
>
> rcu_read_lock();
> - sk = udp6_lib_lookup(&init_net, NULL, 0, &in6addr_any, htons(ROCE_V2_UDP_DPORT), 0);
> + sk = udp6_lib_lookup(dev_net(ndev), NULL, 0, &in6addr_any,
> + htons(ROCE_V2_UDP_DPORT), 0);
> rcu_read_unlock();
> if (!sk) {
> pr_info("file: %s +%d, error\n", __FILE__, __LINE__);
> @@ -549,9 +550,13 @@ int rxe_net_add(const char *ibdev_name, struct net_device *ndev)
> void rxe_net_del(struct ib_device *dev)
> {
> struct sock *sk;
> + struct rxe_dev *rdev;
> +
> + rdev = container_of(dev, struct rxe_dev, ib_dev);
>
> rcu_read_lock();
> - sk = udp4_lib_lookup(&init_net, 0, 0, htonl(INADDR_ANY), htons(ROCE_V2_UDP_DPORT), 0);
> + sk = udp4_lib_lookup(dev_net(rdev->ndev), 0, 0, htonl(INADDR_ANY),
> + htons(ROCE_V2_UDP_DPORT), 0);
> rcu_read_unlock();
> if (!sk)
> return;
> @@ -564,7 +569,8 @@ void rxe_net_del(struct ib_device *dev)
> rxe_release_udp_tunnel(sk->sk_socket);
>
> rcu_read_lock();
> - sk = udp6_lib_lookup(&init_net, NULL, 0, &in6addr_any, htons(ROCE_V2_UDP_DPORT), 0);
> + sk = udp6_lib_lookup(dev_net(rdev->ndev), NULL, 0, &in6addr_any,
> + htons(ROCE_V2_UDP_DPORT), 0);
> rcu_read_unlock();
> if (!sk)
> return;
> @@ -636,6 +642,7 @@ static int rxe_notify(struct notifier_block *not_blk,
> switch (event) {
> case NETDEV_UNREGISTER:
> ib_unregister_device_queued(&rxe->ib_dev);
> + rxe_net_del(&rxe->ib_dev);
> break;
> case NETDEV_UP:
> rxe_port_up(rxe);
> @@ -669,19 +676,19 @@ static struct notifier_block rxe_net_notifier = {
> .notifier_call = rxe_notify,
> };
>
> -static int rxe_net_ipv4_init(void)
> +static int rxe_net_ipv4_init(struct net_device *ndev)
> {
> struct sock *sk;
> struct socket *sock;
>
> rcu_read_lock();
> - sk = udp4_lib_lookup(&init_net, 0, 0, htonl(INADDR_ANY),
> + sk = udp4_lib_lookup(dev_net(ndev), 0, 0, htonl(INADDR_ANY),
> htons(ROCE_V2_UDP_DPORT), 0);
> rcu_read_unlock();
> if (sk)
> return 0;
>
> - sock = rxe_setup_udp_tunnel(&init_net, htons(ROCE_V2_UDP_DPORT), false);
> + sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), false);
> if (IS_ERR(sock)) {
> pr_err("Failed to create IPv4 UDP tunnel\n");
> return -1;
> @@ -690,20 +697,20 @@ static int rxe_net_ipv4_init(void)
> return 0;
> }
>
> -static int rxe_net_ipv6_init(void)
> +static int rxe_net_ipv6_init(struct net_device *ndev)
> {
> #if IS_ENABLED(CONFIG_IPV6)
> struct sock *sk;
> struct socket *sock;
>
> rcu_read_lock();
> - sk = udp6_lib_lookup(&init_net, NULL, 0, &in6addr_any,
> + sk = udp6_lib_lookup(dev_net(ndev), NULL, 0, &in6addr_any,
> htons(ROCE_V2_UDP_DPORT), 0);
> rcu_read_unlock();
> if (sk)
> return 0;
>
> - sock = rxe_setup_udp_tunnel(&init_net, htons(ROCE_V2_UDP_DPORT), true);
> + sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), true);
> if (PTR_ERR(sock) == -EAFNOSUPPORT) {
> pr_warn("IPv6 is not supported, can not create a UDPv6 socket\n");
> return 0;
> @@ -735,14 +742,14 @@ void rxe_net_exit(void)
> unregister_netdevice_notifier(&rxe_net_notifier);
> }
>
> -int rxe_net_init(void)
> +int rxe_net_init(struct net_device *ndev)
> {
> int err;
>
> - err = rxe_net_ipv4_init();
> + err = rxe_net_ipv4_init(ndev);
> if (err)
> return err;
> - err = rxe_net_ipv6_init();
> + err = rxe_net_ipv6_init(ndev);
> if (err)
> goto err_out;
> return 0;
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h
> index 027b20e1bab6..56249677d692 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.h
> +++ b/drivers/infiniband/sw/rxe/rxe_net.h
> @@ -15,7 +15,7 @@ int rxe_net_add(const char *ibdev_name, struct net_device *ndev);
> void rxe_net_del(struct ib_device *dev);
>
> int rxe_register_notifier(void);
> -int rxe_net_init(void);
> +int rxe_net_init(struct net_device *ndev);
> void rxe_net_exit(void);
>
> #endif /* RXE_NET_H */
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv3 7/8] RDMA/rxe: Add the support of net namespace notifier
[not found] ` <20230214060634.427162-8-yanjun.zhu@intel.com>
@ 2023-02-23 13:14 ` Zhu Yanjun
0 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2023-02-23 13:14 UTC (permalink / raw)
To: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, parav,
netdev@vger.kernel.org
Cc: Zhu Yanjun
在 2023/2/14 14:06, Zhu Yanjun 写道:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> The functions register_pernet_subsys/unregister_pernet_subsys register a
> notifier of net namespace. When a new net namespace is created, the init
> function of rxe will be called to initialize sk4 and sk6 socks. When a
> net namespace is destroyed, the exit function will be called to handle
> sk4 and sk6 socks.
>
> The functions rxe_ns_pernet_sk4 and rxe_ns_pernet_sk6 are used to get
> sk4 and sk6 socks.
>
> The functions rxe_ns_pernet_set_sk4 and rxe_ns_pernet_set_sk6 are used
> to set sk4 and sk6 socks.
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Add netdev@vger.kernel.org.
Zhu Yanjun
> ---
> drivers/infiniband/sw/rxe/Makefile | 3 +-
> drivers/infiniband/sw/rxe/rxe.c | 9 ++
> drivers/infiniband/sw/rxe/rxe_net.c | 50 +++++------
> drivers/infiniband/sw/rxe/rxe_ns.c | 134 ++++++++++++++++++++++++++++
> drivers/infiniband/sw/rxe/rxe_ns.h | 17 ++++
> 5 files changed, 187 insertions(+), 26 deletions(-)
> create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.c
> create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.h
>
> diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile
> index 5395a581f4bb..8380f97674cb 100644
> --- a/drivers/infiniband/sw/rxe/Makefile
> +++ b/drivers/infiniband/sw/rxe/Makefile
> @@ -22,4 +22,5 @@ rdma_rxe-y := \
> rxe_mcast.o \
> rxe_task.o \
> rxe_net.o \
> - rxe_hw_counters.o
> + rxe_hw_counters.o \
> + rxe_ns.o
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index 4a17e4a003f5..c297677bf06a 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -9,6 +9,7 @@
> #include "rxe.h"
> #include "rxe_loc.h"
> #include "rxe_net.h"
> +#include "rxe_ns.h"
>
> MODULE_AUTHOR("Bob Pearson, Frank Zago, John Groves, Kamal Heib");
> MODULE_DESCRIPTION("Soft RDMA transport");
> @@ -234,6 +235,12 @@ static int __init rxe_module_init(void)
> return -1;
> }
>
> + err = rxe_namespace_init();
> + if (err) {
> + pr_err("Failed to register net namespace notifier\n");
> + return -1;
> + }
> +
> pr_info("loaded\n");
> return 0;
> }
> @@ -244,6 +251,8 @@ static void __exit rxe_module_exit(void)
> ib_unregister_driver(RDMA_DRIVER_RXE);
> rxe_net_exit();
>
> + rxe_namespace_exit();
> +
> pr_info("unloaded\n");
> }
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 9af90587642a..8135876b11f6 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -17,6 +17,7 @@
> #include "rxe.h"
> #include "rxe_net.h"
> #include "rxe_loc.h"
> +#include "rxe_ns.h"
>
> static struct dst_entry *rxe_find_route4(struct rxe_qp *qp,
> struct net_device *ndev,
> @@ -554,33 +555,30 @@ void rxe_net_del(struct ib_device *dev)
>
> rdev = container_of(dev, struct rxe_dev, ib_dev);
>
> - rcu_read_lock();
> - sk = udp4_lib_lookup(dev_net(rdev->ndev), 0, 0, htonl(INADDR_ANY),
> - htons(ROCE_V2_UDP_DPORT), 0);
> - rcu_read_unlock();
> + sk = rxe_ns_pernet_sk4(dev_net(rdev->ndev));
> if (!sk)
> return;
>
> - __sock_put(sk);
>
> - if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL)
> + if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) {
> __sock_put(sk);
> - else
> + } else {
> rxe_release_udp_tunnel(sk->sk_socket);
> + sk = NULL;
> + rxe_ns_pernet_set_sk4(dev_net(rdev->ndev), sk);
> + }
>
> - rcu_read_lock();
> - sk = udp6_lib_lookup(dev_net(rdev->ndev), NULL, 0, &in6addr_any,
> - htons(ROCE_V2_UDP_DPORT), 0);
> - rcu_read_unlock();
> + sk = rxe_ns_pernet_sk6(dev_net(rdev->ndev));
> if (!sk)
> return;
>
> - __sock_put(sk);
> -
> - if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL)
> + if (refcount_read(&sk->sk_refcnt) > SK_REF_FOR_TUNNEL) {
> __sock_put(sk);
> - else
> + } else {
> rxe_release_udp_tunnel(sk->sk_socket);
> + sk = NULL;
> + rxe_ns_pernet_set_sk6(dev_net(rdev->ndev), sk);
> + }
> }
> #undef SK_REF_FOR_TUNNEL
>
> @@ -681,18 +679,18 @@ static int rxe_net_ipv4_init(struct net_device *ndev)
> struct sock *sk;
> struct socket *sock;
>
> - rcu_read_lock();
> - sk = udp4_lib_lookup(dev_net(ndev), 0, 0, htonl(INADDR_ANY),
> - htons(ROCE_V2_UDP_DPORT), 0);
> - rcu_read_unlock();
> - if (sk)
> + sk = rxe_ns_pernet_sk4(dev_net(ndev));
> + if (sk) {
> + sock_hold(sk);
> return 0;
> + }
>
> sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), false);
> if (IS_ERR(sock)) {
> pr_err("Failed to create IPv4 UDP tunnel\n");
> return -1;
> }
> + rxe_ns_pernet_set_sk4(dev_net(ndev), sock->sk);
>
> return 0;
> }
> @@ -703,12 +701,11 @@ static int rxe_net_ipv6_init(struct net_device *ndev)
> struct sock *sk;
> struct socket *sock;
>
> - rcu_read_lock();
> - sk = udp6_lib_lookup(dev_net(ndev), NULL, 0, &in6addr_any,
> - htons(ROCE_V2_UDP_DPORT), 0);
> - rcu_read_unlock();
> - if (sk)
> + sk = rxe_ns_pernet_sk6(dev_net(ndev));
> + if (sk) {
> + sock_hold(sk);
> return 0;
> + }
>
> sock = rxe_setup_udp_tunnel(dev_net(ndev), htons(ROCE_V2_UDP_DPORT), true);
> if (PTR_ERR(sock) == -EAFNOSUPPORT) {
> @@ -720,6 +717,9 @@ static int rxe_net_ipv6_init(struct net_device *ndev)
> pr_err("Failed to create IPv6 UDP tunnel\n");
> return -1;
> }
> +
> + rxe_ns_pernet_set_sk6(dev_net(ndev), sock->sk);
> +
> #endif
> return 0;
> }
> diff --git a/drivers/infiniband/sw/rxe/rxe_ns.c b/drivers/infiniband/sw/rxe/rxe_ns.c
> new file mode 100644
> index 000000000000..29d08899dcda
> --- /dev/null
> +++ b/drivers/infiniband/sw/rxe/rxe_ns.c
> @@ -0,0 +1,134 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
> + * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
> + */
> +
> +#include <net/sock.h>
> +#include <net/netns/generic.h>
> +#include <net/net_namespace.h>
> +#include <linux/module.h>
> +#include <linux/skbuff.h>
> +#include <linux/pid_namespace.h>
> +#include <net/udp_tunnel.h>
> +
> +#include "rxe_ns.h"
> +
> +/*
> + * Per network namespace data
> + */
> +struct rxe_ns_sock {
> + struct sock __rcu *rxe_sk4;
> + struct sock __rcu *rxe_sk6;
> +};
> +
> +/*
> + * Index to store custom data for each network namespace.
> + */
> +static unsigned int rxe_pernet_id;
> +
> +/*
> + * Called for every existing and added network namespaces
> + */
> +static int __net_init rxe_ns_init(struct net *net)
> +{
> + /*
> + * create (if not present) and access data item in network namespace
> + * (net) using the id (net_id)
> + */
> + struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> +
> + rcu_assign_pointer(ns_sk->rxe_sk4, NULL); /* initialize sock 4 socket */
> + rcu_assign_pointer(ns_sk->rxe_sk6, NULL); /* initialize sock 6 socket */
> + synchronize_rcu();
> +
> + return 0;
> +}
> +
> +static void __net_exit rxe_ns_exit(struct net *net)
> +{
> + /*
> + * called when the network namespace is removed
> + */
> + struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> + struct sock *rxe_sk4 = NULL;
> + struct sock *rxe_sk6 = NULL;
> +
> + rcu_read_lock();
> + rxe_sk4 = rcu_dereference(ns_sk->rxe_sk4);
> + rxe_sk6 = rcu_dereference(ns_sk->rxe_sk6);
> + rcu_read_unlock();
> +
> + /* close socket */
> + if (rxe_sk4 && rxe_sk4->sk_socket) {
> + udp_tunnel_sock_release(rxe_sk4->sk_socket);
> + rcu_assign_pointer(ns_sk->rxe_sk4, NULL);
> + synchronize_rcu();
> + }
> +
> + if (rxe_sk6 && rxe_sk6->sk_socket) {
> + udp_tunnel_sock_release(rxe_sk6->sk_socket);
> + rcu_assign_pointer(ns_sk->rxe_sk6, NULL);
> + synchronize_rcu();
> + }
> +}
> +
> +/*
> + * callback to make the module network namespace aware
> + */
> +static struct pernet_operations rxe_net_ops __net_initdata = {
> + .init = rxe_ns_init,
> + .exit = rxe_ns_exit,
> + .id = &rxe_pernet_id,
> + .size = sizeof(struct rxe_ns_sock),
> +};
> +
> +struct sock *rxe_ns_pernet_sk4(struct net *net)
> +{
> + struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> + struct sock *sk;
> +
> + rcu_read_lock();
> + sk = rcu_dereference(ns_sk->rxe_sk4);
> + rcu_read_unlock();
> +
> + return sk;
> +}
> +
> +void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk)
> +{
> + struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> +
> + rcu_assign_pointer(ns_sk->rxe_sk4, sk);
> + synchronize_rcu();
> +}
> +
> +struct sock *rxe_ns_pernet_sk6(struct net *net)
> +{
> + struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> + struct sock *sk;
> +
> + rcu_read_lock();
> + sk = rcu_dereference(ns_sk->rxe_sk6);
> + rcu_read_unlock();
> +
> + return sk;
> +}
> +
> +void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk)
> +{
> + struct rxe_ns_sock *ns_sk = net_generic(net, rxe_pernet_id);
> +
> + rcu_assign_pointer(ns_sk->rxe_sk6, sk);
> + synchronize_rcu();
> +}
> +
> +int __init rxe_namespace_init(void)
> +{
> + return register_pernet_subsys(&rxe_net_ops);
> +}
> +
> +void __exit rxe_namespace_exit(void)
> +{
> + unregister_pernet_subsys(&rxe_net_ops);
> +}
> diff --git a/drivers/infiniband/sw/rxe/rxe_ns.h b/drivers/infiniband/sw/rxe/rxe_ns.h
> new file mode 100644
> index 000000000000..a3eac9558889
> --- /dev/null
> +++ b/drivers/infiniband/sw/rxe/rxe_ns.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
> +/*
> + * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
> + * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
> + */
> +
> +#ifndef RXE_NS_H
> +#define RXE_NS_H
> +
> +struct sock *rxe_ns_pernet_sk4(struct net *net);
> +struct sock *rxe_ns_pernet_sk6(struct net *net);
> +void rxe_ns_pernet_set_sk4(struct net *net, struct sock *sk);
> +void rxe_ns_pernet_set_sk6(struct net *net, struct sock *sk);
> +int __init rxe_namespace_init(void);
> +void __exit rxe_namespace_exit(void);
> +
> +#endif /* RXE_NS_H */
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv3 8/8] RDMA/rxe: Replace l_sk6 with sk6 in net namespace
[not found] ` <20230214060634.427162-9-yanjun.zhu@intel.com>
@ 2023-02-23 13:15 ` Zhu Yanjun
0 siblings, 0 replies; 12+ messages in thread
From: Zhu Yanjun @ 2023-02-23 13:15 UTC (permalink / raw)
To: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, parav,
netdev@vger.kernel.org
Cc: Zhu Yanjun
在 2023/2/14 14:06, Zhu Yanjun 写道:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> The net namespace variable sk6 can be used. As such, l_sk6 can be
> replaced with it.
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Add netdev@vger.kernel.org.
Zhu Yanjun
> ---
> drivers/infiniband/sw/rxe/rxe.c | 1 -
> drivers/infiniband/sw/rxe/rxe_net.c | 20 +-------------------
> drivers/infiniband/sw/rxe/rxe_verbs.h | 1 -
> 3 files changed, 1 insertion(+), 21 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index c297677bf06a..3260f598a7fb 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -75,7 +75,6 @@ static void rxe_init_device_param(struct rxe_dev *rxe)
> rxe->ndev->dev_addr);
>
> rxe->max_ucontext = RXE_MAX_UCONTEXT;
> - rxe->l_sk6 = NULL;
> }
>
> /* initialize port attributes */
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 8135876b11f6..ebcb86fa1e5e 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -50,24 +50,6 @@ static struct dst_entry *rxe_find_route6(struct rxe_qp *qp,
> {
> struct dst_entry *ndst;
> struct flowi6 fl6 = { { 0 } };
> - struct rxe_dev *rdev;
> -
> - rdev = rxe_get_dev_from_net(ndev);
> - if (!rdev->l_sk6) {
> - struct sock *sk;
> -
> - rcu_read_lock();
> - sk = udp6_lib_lookup(dev_net(ndev), NULL, 0, &in6addr_any,
> - htons(ROCE_V2_UDP_DPORT), 0);
> - rcu_read_unlock();
> - if (!sk) {
> - pr_info("file: %s +%d, error\n", __FILE__, __LINE__);
> - return (struct dst_entry *)sk;
> - }
> - __sock_put(sk);
> - rdev->l_sk6 = sk->sk_socket;
> - }
> -
>
> memset(&fl6, 0, sizeof(fl6));
> fl6.flowi6_oif = ndev->ifindex;
> @@ -76,7 +58,7 @@ static struct dst_entry *rxe_find_route6(struct rxe_qp *qp,
> fl6.flowi6_proto = IPPROTO_UDP;
>
> ndst = ipv6_stub->ipv6_dst_lookup_flow(dev_net(ndev),
> - rdev->l_sk6->sk, &fl6,
> + rxe_ns_pernet_sk6(dev_net(ndev)), &fl6,
> NULL);
> if (IS_ERR(ndst)) {
> rxe_dbg_qp(qp, "no route to %pI6\n", daddr);
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index 52c4ef4d0305..19ddfa890480 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -408,7 +408,6 @@ struct rxe_dev {
>
> struct rxe_port port;
> struct crypto_shash *tfm;
> - struct socket *l_sk6;
> };
>
> static inline void rxe_counter_inc(struct rxe_dev *rxe, enum rxe_counters index)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCHv3 0/8] Fix the problem that rxe can not work in net namespace
2023-02-23 0:31 ` [PATCHv3 0/8] Fix the problem that rxe can not work in net namespace Zhu Yanjun
2023-02-23 4:56 ` Jakub Kicinski
@ 2023-02-25 8:43 ` Rain River
1 sibling, 0 replies; 12+ messages in thread
From: Rain River @ 2023-02-25 8:43 UTC (permalink / raw)
To: Zhu Yanjun
Cc: Zhu Yanjun, jgg, leon, zyjzyj2000, linux-rdma, parav,
netdev@vger.kernel.org
On Thu, Feb 23, 2023 at 8:37 AM Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>
> 在 2023/2/14 14:06, Zhu Yanjun 写道:
> > From: Zhu Yanjun <yanjun.zhu@linux.dev>
> >
> > When run "ip link add" command to add a rxe rdma link in a net
> > namespace, normally this rxe rdma link can not work in a net
> > name space.
> >
> > The root cause is that a sock listening on udp port 4791 is created
> > in init_net when the rdma_rxe module is loaded into kernel. That is,
> > the sock listening on udp port 4791 is created in init_net. Other net
> > namespace is difficult to use this sock.
> >
> > The following commits will solve this problem.
> >
> > In the first commit, move the creating sock listening on udp port 4791
> > from module_init function to rdma link creating functions. That is,
> > after the module rdma_rxe is loaded, the sock will not be created.
> > When run "rdma link add ..." command, the sock will be created. So
> > when creating a rdma link in the net namespace, the sock will be
> > created in this net namespace.
> >
> > In the second commit, the functions udp4_lib_lookup and udp6_lib_lookup
> > will check the sock exists in the net namespace or not. If yes, rdma
> > link will increase the reference count of this sock, then continue other
> > jobs instead of creating a new sock to listen on udp port 4791. Since the
> > network notifier is global, when the module rdma_rxe is loaded, this
> > notifier will be registered.
> >
> > After the rdma link is created, the command "rdma link del" is to
> > delete rdma link at the same time the sock is checked. If the reference
> > count of this sock is greater than the sock reference count needed by
> > udp tunnel, the sock reference count is decreased by one. If equal, it
> > indicates that this rdma link is the last one. As such, the udp tunnel
> > is shut down and the sock is closed. The above work should be
> > implemented in linkdel function. But currently no dellink function in
> > rxe. So the 3rd commit addes dellink function pointer. And the 4th
> > commit implements the dellink function in rxe.
> >
> > To now, it is not necessary to keep a global variable to store the sock
> > listening udp port 4791. This global variable can be replaced by the
> > functions udp4_lib_lookup and udp6_lib_lookup totally. Because the
> > function udp6_lib_lookup is in the fast path, a member variable l_sk6
> > is added to store the sock. If l_sk6 is NULL, udp6_lib_lookup is called
> > to lookup the sock, then the sock is stored in l_sk6, in the future,it
> > can be used directly.
> >
> > All the above work has been done in init_net. And it can also work in
> > the net namespace. So the init_net is replaced by the individual net
> > namespace. This is what the 6th commit does. Because rxe device is
> > dependent on the net device and the sock listening on udp port 4791,
> > every rxe device is in exclusive mode in the individual net namespace.
> > Other rdma netns operations will be considerred in the future.
> >
> > In the 7th commit, the register_pernet_subsys/unregister_pernet_subsys
> > functions are added. When a new net namespace is created, the init
> > function will initialize the sk4 and sk6 socks. Then the 2 socks will
> > be released when the net namespace is destroyed. The functions
> > rxe_ns_pernet_sk4/rxe_ns_pernet_set_sk4 will get and set sk4 in the net
> > namespace. The functions rxe_ns_pernet_sk6/rxe_ns_pernet_set_sk6 will
> > handle sk6. Then sk4 and sk6 are used in the previous commits.
> >
> > As the sk4 and sk6 in pernet namespace can be accessed, it is not
> > necessary to add a new l_sk6. As such, in the 8th commit, the l_sk6 is
> > replaced with the sk6 in pernet namespace.
> >
> > Test steps:
> > 1) Suppose that 2 NICs are in 2 different net namespaces.
> >
> > # ip netns exec net0 ip link
> > 3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
> > link/ether 00:1e:67:a0:22:3f brd ff:ff:ff:ff:ff:ff
> > altname enp5s0
> >
> > # ip netns exec net1 ip link
> > 4: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
> > link/ether f8:e4:3b:3b:e4:10 brd ff:ff:ff:ff:ff:ff
> >
> > 2) Add rdma link in the different net namespace
> > net0:
> > # ip netns exec net0 rdma link add rxe0 type rxe netdev eno2
> >
> > net1:
> > # ip netns exec net1 rdma link add rxe1 type rxe netdev eno3
> >
> > 3) Run rping test.
> > net0
> > # ip netns exec net0 rping -s -a 192.168.2.1 -C 1&
> > [1] 1737
> > # ip netns exec net1 rping -c -a 192.168.2.1 -d -v -C 1
> > verbose
> > count 1
> > ...
> > ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
> > ...
> >
> > 4) Remove the rdma links from the net namespaces.
> > net0:
> > # ip netns exec net0 ss -lu
> > State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
> > UNCONN 0 0 0.0.0.0:4791 0.0.0.0:*
> > UNCONN 0 0 [::]:4791 [::]:*
> >
> > # ip netns exec net0 rdma link del rxe0
> >
> > # ip netns exec net0 ss -lu
> > State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
> >
> > net1:
> > # ip netns exec net0 ss -lu
> > State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
> > UNCONN 0 0 0.0.0.0:4791 0.0.0.0:*
> > UNCONN 0 0 [::]:4791 [::]:*
> >
> > # ip netns exec net1 rdma link del rxe1
> >
> > # ip netns exec net0 ss -lu
> > State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
> >
> > V2->V3: 1) Add "rdma link del" example in the cover letter, and use "ss -lu" to
> > verify rdma link is removed.
> > 2) Add register_pernet_subsys/unregister_pernet_subsys net namespace
> > 3) Replace l_sk6 with sk6 of pernet_name_space
Thanks,
Tested-by: Rain River <rain.1986.08.12@gmail.com>
> >
> > V1->V2: Add the explicit initialization of sk6.
>
> Add netdev@vger.kernel.org.
>
> Zhu Yanjun
>
> >
> > Zhu Yanjun (8):
> > RDMA/rxe: Creating listening sock in newlink function
> > RDMA/rxe: Support more rdma links in init_net
> > RDMA/nldev: Add dellink function pointer
> > RDMA/rxe: Implement dellink in rxe
> > RDMA/rxe: Replace global variable with sock lookup functions
> > RDMA/rxe: add the support of net namespace
> > RDMA/rxe: Add the support of net namespace notifier
> > RDMA/rxe: Replace l_sk6 with sk6 in net namespace
> >
> > drivers/infiniband/core/nldev.c | 6 ++
> > drivers/infiniband/sw/rxe/Makefile | 3 +-
> > drivers/infiniband/sw/rxe/rxe.c | 35 +++++++-
> > drivers/infiniband/sw/rxe/rxe_net.c | 113 +++++++++++++++++-------
> > drivers/infiniband/sw/rxe/rxe_net.h | 9 +-
> > drivers/infiniband/sw/rxe/rxe_ns.c | 128 ++++++++++++++++++++++++++++
> > drivers/infiniband/sw/rxe/rxe_ns.h | 11 +++
> > include/rdma/rdma_netlink.h | 2 +
> > 8 files changed, 267 insertions(+), 40 deletions(-)
> > create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.c
> > create mode 100644 drivers/infiniband/sw/rxe/rxe_ns.h
> >
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-02-25 8:43 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20230214060634.427162-1-yanjun.zhu@intel.com>
2023-02-23 0:31 ` [PATCHv3 0/8] Fix the problem that rxe can not work in net namespace Zhu Yanjun
2023-02-23 4:56 ` Jakub Kicinski
2023-02-23 11:42 ` Zhu Yanjun
2023-02-25 8:43 ` Rain River
[not found] ` <20230214060634.427162-2-yanjun.zhu@intel.com>
2023-02-23 13:10 ` [PATCHv3 1/8] RDMA/rxe: Creating listening sock in newlink function Zhu Yanjun
[not found] ` <20230214060634.427162-3-yanjun.zhu@intel.com>
2023-02-23 13:10 ` [PATCHv3 2/8] RDMA/rxe: Support more rdma links in init_net Zhu Yanjun
[not found] ` <20230214060634.427162-4-yanjun.zhu@intel.com>
2023-02-23 13:11 ` [PATCHv3 3/8] RDMA/nldev: Add dellink function pointer Zhu Yanjun
[not found] ` <20230214060634.427162-5-yanjun.zhu@intel.com>
2023-02-23 13:12 ` [PATCHv3 4/8] RDMA/rxe: Implement dellink in rxe Zhu Yanjun
[not found] ` <20230214060634.427162-6-yanjun.zhu@intel.com>
2023-02-23 13:13 ` [PATCHv3 5/8] RDMA/rxe: Replace global variable with sock lookup functions Zhu Yanjun
[not found] ` <20230214060634.427162-7-yanjun.zhu@intel.com>
2023-02-23 13:14 ` [PATCHv3 6/8] RDMA/rxe: add the support of net namespace Zhu Yanjun
[not found] ` <20230214060634.427162-8-yanjun.zhu@intel.com>
2023-02-23 13:14 ` [PATCHv3 7/8] RDMA/rxe: Add the support of net namespace notifier Zhu Yanjun
[not found] ` <20230214060634.427162-9-yanjun.zhu@intel.com>
2023-02-23 13:15 ` [PATCHv3 8/8] RDMA/rxe: Replace l_sk6 with sk6 in net namespace Zhu Yanjun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).