From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAEB8C38A2D for ; Wed, 26 Oct 2022 04:08:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231597AbiJZEIo (ORCPT ); Wed, 26 Oct 2022 00:08:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230057AbiJZEIo (ORCPT ); Wed, 26 Oct 2022 00:08:44 -0400 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90D5A96A07 for ; Tue, 25 Oct 2022 21:08:42 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R981e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=dust.li@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0VT5Xl2f_1666757319; Received: from localhost(mailfrom:dust.li@linux.alibaba.com fp:SMTPD_---0VT5Xl2f_1666757319) by smtp.aliyun-inc.com; Wed, 26 Oct 2022 12:08:40 +0800 Date: Wed, 26 Oct 2022 12:08:38 +0800 From: Dust Li To: Yanjun Zhu , Zhu Yanjun , jgg@ziepe.ca, leon@kernel.org, linux-rdma@vger.kernel.org Subject: Re: [PATCH 0/3] RDMA net namespace Message-ID: <20221026040838.GE56517@linux.alibaba.com> Reply-To: dust.li@linux.alibaba.com References: <20221024011007.GE63658@linux.alibaba.com> <20221023220450.2287909-1-yanjun.zhu@intel.com> <20221024115228.GF63658@linux.alibaba.com> <662d6804-0e16-117e-d4a4-9abd4a2e8c75@linux.dev> <20221024143521.GG63658@linux.alibaba.com> <1bf1e54f-f8d2-6b8c-5ded-ef701719816d@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1bf1e54f-f8d2-6b8c-5ded-ef701719816d@linux.dev> Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On Tue, Oct 25, 2022 at 10:51:38AM +0800, Yanjun Zhu wrote: > >在 2022/10/24 22:35, Dust Li 写道: >> On Mon, Oct 24, 2022 at 09:12:56PM +0800, Yanjun Zhu wrote: >> > 在 2022/10/24 19:52, Dust Li 写道: >> > > On Mon, Oct 24, 2022 at 06:15:01AM +0000, yanjun.zhu@linux.dev wrote: >> > > > October 24, 2022 9:10 AM, "Dust Li" wrote: >> > > > >> > > > > On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote: >> > > > > >> > > > > > From: Zhu Yanjun >> > > > > > >> > > > > > There are shared and exclusive modes in RDMA net namespace. After >> > > > > > discussion with Leon, the above modes are compatible with legacy IB >> > > > > > device. >> > > > > > >> > > > > > To the RoCE and iWARP devices, the ib devices should be in the same net >> > > > > > namespace with the related net devices regardless of in shared or >> > > > > > exclusive mode. >> > > > > Does this mean that shared mode is no longer supported for RoCE and iWarp >> > > > > devices ? >> > > >From the discussion, a RoCE and iWarp device should make ib devices and net devices in the same net. So a RoCE and iWarp device has no shared/exclusive modes. >> > > > Shared/exclusive modes are for legacy ib devices, such as ipoib. >> > > > >> > > > In this patch series, shared/exclusive modes are left for legacy ib devices. >> > > > To a RoCE and iWarp device, we just keep net devices and ib devices in the same net. >> > > I think this may limit the use case of RoCE and iWarp. >> > > >> > > See the following use case: >> > > In the container enviroment, we may have lots of containers on a host, >> > > for example, more than 100. And we don't have that much VFs, so we use >> > > ipvlan or other virtual network devices for each container, and put >> > > those virtual network devices into each container(net namespace). >> > > Since we only use 1 physical network device for all those containers, >> > > there is only one RoCE device. If we don't support shared mode, we >> > > cannot even enable RDMA for those containers with RoCE. >> > You use the ipvlan or other virtual network devices for each container. >> > >> > In these containers, you also use RDMA, correct? >> > >> > Since all the packets for these virtual network devices finally come to >> > >> > the physical network devices, without shared/exclusive modes, it should work. >> > >> > So we do not consider the shared/exclusive mode. >> For the netdevice, that's true. But for RDMA, we should not even see >> the ib device in the containers any more, so I think we cannot create >> qp/cq, and RDMA is not available for these containers in this case. > >I can not get you. > >Do you mean that RDMA can not be accessed in the container after these >patches are applied? > >Can you share a test case with me? OK, I will test your patch first and if possible I will provider a test case. Thanks > > >Zhu Yanjun > >> >> Thanks >> >> >> >> > Zhu Yanjun >> > >> > > I don't know any other way to solve this, maybe I missed something ? >> > > >> > > Thanks >> > > >> > > > >> > > > > > In the first commit, when the net devices are moved to a new net >> > > > > > namespace, the related ib devices are also moved to the same net >> > > > > > namespace. >> > > > > > >> > > > > > In the second commit, the shared/exclusive modes still work with legacy >> > > > > > ib devices. To the RoCE and iWARP devices, these modes will not be >> > > > > > considered. >> > > > > > >> > > > > > Because MLX4/5 do not call the function ib_device_set_netdev to map ib >> > > > > > devices and the related net devices, the function ib_device_get_by_netdev >> > > > > > can not get ib devices from net devices. In the third commit, all the >> > > > > > registered ib devices are parsed to get the net devices, then compared >> > > > > > with the given net devices. >> > > > > > >> > > > > > The steps to make tests: >> > > > > > 1) Create a new net namespace net0 >> > > > > > >> > > > > > ip netns add net0 >> > > > > > >> > > > > > 2) Show the rdma links in init_net >> > > > > > >> > > > > > rdma link >> > > > > > >> > > > > > " >> > > > > > link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1 >> > > > > > " >> > > > > > >> > > > > > 3) Move the net device to net namespace net0 >> > > > > > >> > > > > > ip link set enp7s0np1 netns net0 >> > > > > > >> > > > > > 4) Show the rdma links in init_net again >> > > > > > >> > > > > > rdma link >> > > > > > >> > > > > > There is no rdma links >> > > > > > >> > > > > > 5) Show the rdma links in net0 >> > > > > > >> > > > > > ip netns exec net0 rdma link >> > > > > > >> > > > > > " >> > > > > > link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1 >> > > > > > " >> > > > > > >> > > > > > We can confirm that rdma links are moved to the same net namespace with >> > > > > > the net devices. >> > > > > > >> > > > > > Zhu Yanjun (3): >> > > > > > RDMA/core: Move ib device to the same net namespace with net device >> > > > > > RDMA/core: The legacy IB devices still work with shared/exclusive mode >> > > > > > RDMA/core: Get all the ib devices from net devices >> > > > > > >> > > > > > drivers/infiniband/core/device.c | 107 ++++++++++++++++++++++++++++++- >> > > > > > 1 file changed, 105 insertions(+), 2 deletions(-) >> > > > > > >> > > > > > -- >> > > > > > 2.27.0