From: Leon Romanovsky <leon@kernel.org>
To: Kangjing Huang <huangkangjing@gmail.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>, linux-cifs@vger.kernel.org
Subject: Re: [PATCH net] net/smc: Fix lookup of netdev by using ib_device_get_netdev()
Date: Thu, 19 Dec 2024 18:56:16 +0200 [thread overview]
Message-ID: <20241219165616.GF82731@unreal> (raw)
In-Reply-To: <CAPbmFQZL4us=CLvORKkEDBr+23zgLTSFDUUqv7OmBxdaSir_YA@mail.gmail.com>
On Sat, Dec 14, 2024 at 08:02:14AM +0000, Kangjing Huang wrote:
> On Sat, Dec 14, 2024 at 1:06 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> >
> >
> > On Sat, Dec 14, 2024, at 04:33, Namjae Jeon wrote:
> > > On Fri, Dec 13, 2024 at 8:07 PM Kangjing Huang <huangkangjing@gmail.com> wrote:
> > >>
> > >> Hi there,
> > >>
> > >> I am the original author of commit ecce70cf17d9 ("ksmbd: fix missing
> > >> RDMA-capable flag for IPoIB device in ksmbd_rdma_capable_netdev()"),
> > >> as mentioned in the thread.
> > >>
> > >> I am working on modifying the patch to take care of the layering
> > >> violation. The original patch was meant to fix an issue with ksmbd,
> > >> where an IPoIB netdev was not recognized as RDMA-capable. The original
> > >> version of the capability evaluation tries to match each netdev to
> > >> ib_device by calling get_netdev in ib verbs. However this only works
> > >> in cases where the ib_device is the upper layer of netdev (e.g. RoCE),
> > >> and since with IPoIB it is the other way around (netdev is the upper
> > >> layer of ib_device), get_netdev won't work anymore.
> > >>
> > >> I tried to replicate the behavior of device matching reversely in the
> > >> original version of my patch using GID, which ended up as the layering
> > >> violation. However I am unaware of any exported functions from the
> > >> IPoIB driver that could do the reverse lookup from netdev to the lower
> > >> layer ib_device. Actually it seems that the IPoIB driver does not have
> > >> any exported symbols at all.
> > >>
> > >> It might be that the device matching in reverse just does not make any
> > >> sense and does not need to be done at all. As long as it is an IPoIB
> > >> device (netdev->type == ARPHRD_INFINIBAND) it might be ok to just
> > >> automatically assume it is RDMA-capable. I am not 100% sure about this
> > >> though.
> > > Why can't we assume RDMA-capable if it's ARPHRD_INFINIBAND type?
> > > How about assuming it's RDMA-capable and allowing users to turn
> > > RDMA-capable on/off via sysfs?
> It does make more sense to me at this point to just broadly assume all
> ARPHRD_INFINIBAND types to be RDMA-capable, we just need to make sure
> this assumption indeed holds and figure out to what extent this could
> involve the same layering violation.
>
> >
> > Any attempt to treat ipoib differently from regular netdevice is wrong by definition.
> >
> I would agree that the design direction to treat ipoib as a pure
> regular net_device is the good way to go. But the problem with ksmbd
> and ipoib devices stems from the SMB protocol itself.
>
> In contrast to protocols that focus on certain functionalities like
> nfs, SMB actually tries to manage network interfaces actively in the
> protocol itself: SMB protocol's RDMA support (dubbed SMB Direct) is a
> sub-feature of SMB Multichannel. Multichannel is designed to let
> client and server find multiple data paths automatically (imagine a
> pair of hosts with multiple adapters connected by multiple cables) to
> increase bandwidth. So client can initiate a
> FSCTL_QUERY_NETWORK_INTERFACE_INFO request and server is expected to
> respond with NETWORK_INTERFACE_INFO containing _all_ local network
> interface informations, including their capabilities such as
> RDMA_CAPABLE (for details see ref [MS-SMB2] 3.3.5.15.11) Only upon
> seeing the capability flag would a client attempt to initiate a RDMA
> connection.
>
> Reference: [MS-SMB2](https://winprotocoldoc.z19.web.core.windows.net/MS-SMB2/%5bMS-SMB2%5d.pdf)
>
> TLDR is that the SMB protocol requires the server to enumerate all
> net_devices and indicate their RDMA capability, and
> ksmbd_rdma_capable_netdev() is only used in that process. Given such
> context, I wonder what should be the best way to approach this? Is
> using ARPHRD_INFINIBAND good enough and acceptable in terms of
> layering?
The thing is that ARPHRD_INFINIBAND indeed represent IPoIB and it is
right check if netdev is IPoIB or not. The layering problem is that
upper layers (ULPs) should use it as regular netdevice.
Thanks
>
> > >
> > > Thanks!
> > >>
> > >> I am uncertain about how to proceed at this point and would like to
> > >> know your thoughts and opinions on this.
> > >>
> > >> Thanks,
> > >> Kangjing
> > >>
> > >> On Fri, Nov 8, 2024 at 5:59 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >> >
> > >> > On Fri, Nov 08, 2024 at 08:40:40AM +0900, Namjae Jeon wrote:
> > >> > > On Thu, Nov 7, 2024 at 9:00 PM Halil Pasic <pasic@linux.ibm.com> wrote:
> > >> > > >
> > >> > > > On Wed, 6 Nov 2024 15:59:10 +0200
> > >> > > > Leon Romanovsky <leon@kernel.org> wrote:
> > >> > > >
> > >> > > > > > Does fs/smb/server/transport_rdma.c qualify as inside of RDMA core code?
> > >> > > > >
> > >> > > > > RDMA core code is drivers/infiniband/core/*.
> > >> > > >
> > >> > > > Understood. So this is a violation of the no direct access to the
> > >> > > > callbacks rule.
> > >> > > >
> > >> > > > >
> > >> > > > > > I would guess it is not, and I would not actually mind sending a patch
> > >> > > > > > but I have trouble figuring out the logic behind commit ecce70cf17d9
> > >> > > > > > ("ksmbd: fix missing RDMA-capable flag for IPoIB device in
> > >> > > > > > ksmbd_rdma_capable_netdev()").
> > >> > > > >
> > >> > > > > It is strange version of RDMA-CM. All other ULPs use RDMA-CM to avoid
> > >> > > > > GID, netdev and fabric complexity.
> > >> > > >
> > >> > > > I'm not familiar enough with either of the subsystems. Based on your
> > >> > > > answer my guess is that it ain't outright bugous but still a layering
> > >> > > > violation. Copying linux-cifs@vger.kernel.org so that
> > >> > > > the smb are aware.
> > >> > > Could you please elaborate what the violation is ?
> > >> >
> > >> > There are many, but the most screaming is that ksmbd has logic to
> > >> > differentiate IPoIB devices. These devices are pure netdev devices
> > >> > and should be treated like that. ULPs should treat them exactly
> > >> > as they treat netdev devices.
> > >> >
> > >> > > I would also appreciate it if you could suggest to me how to fix this.
> > >> > >
> > >> > > Thanks.
> > >> > > >
> > >> > > > Thank you very much for all the explanations!
> > >> > > >
> > >> > > > Regards,
> > >> > > > Halil
> > >> > > >
> > >>
> > >>
> > >>
> > >> --
> > >> Kangjing "Chaser" Huang
>
>
>
> --
> Kangjing "Chaser" Huang
next prev parent reply other threads:[~2024-12-19 16:56 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20241025072356.56093-1-wenjia@linux.ibm.com>
[not found] ` <20241027201857.GA1615717@unreal>
[not found] ` <8d17b403-aefa-4f36-a913-7ace41cf2551@linux.ibm.com>
[not found] ` <20241105112313.GE311159@unreal>
[not found] ` <20241106102439.4ca5effc.pasic@linux.ibm.com>
[not found] ` <20241106135910.GF5006@unreal>
2024-11-07 11:56 ` [PATCH net] net/smc: Fix lookup of netdev by using ib_device_get_netdev() Halil Pasic
2024-11-07 12:13 ` Leon Romanovsky
2024-11-07 23:40 ` Namjae Jeon
2024-11-08 17:59 ` Leon Romanovsky
2024-11-09 5:32 ` Namjae Jeon
2024-12-13 11:07 ` Kangjing Huang
2024-12-13 12:15 ` Leon Romanovsky
2024-12-14 2:33 ` Namjae Jeon
2024-12-14 6:06 ` Leon Romanovsky
2024-12-14 8:02 ` Kangjing Huang
2024-12-19 16:56 ` Leon Romanovsky [this message]
2025-01-07 22:51 ` Kangjing Huang
2025-01-08 9:31 ` Leon Romanovsky
2025-01-08 17:27 ` Tom Talpey
2025-01-08 22:40 ` Kangjing Huang
2025-01-09 7:59 ` Leon Romanovsky
2025-01-09 8:02 ` Christoph Hellwig
2025-01-09 10:43 ` Kangjing Huang
2025-01-09 17:49 ` Tom Talpey
2025-01-15 7:17 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241219165616.GF82731@unreal \
--to=leon@kernel.org \
--cc=huangkangjing@gmail.com \
--cc=linkinjeon@kernel.org \
--cc=linux-cifs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox