netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Gal Pressman <galpress@amazon.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Jakub Kicinski <kuba@kernel.org>,
	"Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org, SW_Drivers <SW_Drivers@habana.ai>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"David S. Miller" <davem@davemloft.net>,
	Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	linux-rdma@vger.kernel.org, izur@habana.ai,
	Olof Johansson <olof@lixom.net>
Subject: Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
Date: Fri, 18 Sep 2020 16:09:55 +0300	[thread overview]
Message-ID: <20200918130955.GV869610@unreal> (raw)
In-Reply-To: <CAFCwf12KEa=chCZCWWkJ5bvGDeRCrmBcY9fB8CrtzjOknRQ5Qg@mail.gmail.com>

On Fri, Sep 18, 2020 at 03:31:51PM +0300, Oded Gabbay wrote:
> On Fri, Sep 18, 2020 at 3:19 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Fri, Sep 18, 2020 at 03:07:19PM +0300, Oded Gabbay wrote:
> > > On Fri, Sep 18, 2020 at 3:03 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Fri, Sep 18, 2020 at 02:56:09PM +0300, Oded Gabbay wrote:
> > > > > On Fri, Sep 18, 2020 at 2:52 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > >
> > > > > > On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > > > > > > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > > > > > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > > > > > > >> infrastructure for communication between multiple accelerators. Same
> > > > > > > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > > > > > > >> The RDMA implementation we did does NOT support some basic RDMA
> > > > > > > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > > > > > > >> library or to connect to the rdma infrastructure in the kernel.
> > > > > > > >
> > > > > > > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > > > > > > you can't add random device offloads as IOCTL to nedevs.
> > > > > > > >
> > > > > > > > RDMA is the proper home for all the networking offloads that don't fit
> > > > > > > > into netdev.
> > > > > > > >
> > > > > > > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > > > > > > all. I'm sure this can too.
> > > > > > >
> > > > > > > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > > > > > > was suggested to go through the vfio subsystem instead.
> > > > > > >
> > > > > > > I think this comes back to the discussion we had when EFA was upstreamed, which
> > > > > > > is what's the bar to get accepted to the RDMA subsystem.
> > > > > > > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > > > > > > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
> > > > > > >
> > > > > > > Does GAUDI fit these requirements? If not, should it be in a different subsystem
> > > > > > > or should we open the "what qualifies as an RDMA device" question again?
> > > > > >
> > > > > > I want to remind you that rdma-core requirement came to make sure that
> > > > > > anything exposed from the RDMA to the userspace is strict with proper
> > > > > > UAPI header hygiene.
> > > > > >
> > > > > > I doubt that Havana's ioctls are backed by anything like this.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Why do you doubt that ? Have you looked at our code ?
> > > > > Our uapi and IOCTLs interface is based on drm subsystem uapi interface
> > > > > and it is very safe and protected.
> > > >
> > > > Yes, I looked and didn't find open-source users of your UAPI headers.
> > > > It is not related to being safe or protected by to the common request
> > > > to present userspace that relies on those exported interfaces.
> > > >
> > > > > Otherwise Greg would have never allowed me to go upstream in the first place.
> > > >
> > > > Nice, can we get a link?
> > > >
> > > > >
> > > > > We have a single function which is the entry point for all the IOCTLs
> > > > > of our drivers (only one IOCTL is RDMA related, all the others are
> > > > > compute related).
> > > > > That function is almost 1:1 copy of the function in drm.
> > > >
> > > > DRM has same rules as RDMA, no kernel code will be merged without seeing
> > > > open-source userspace.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks,
> > > > > Oded
> > >
> > > So we do have an open-source library called hl-thunk, which uses our
> > > driver and indeed that was part of the requirement.
> > > It is similar to libdrm.
> > > Here is the link:
> > > https://github.com/HabanaAI/hl-thunk
> >
> > Are you kidding?
> >
> > This is mirror of some internal repository that looks like dumpster
> > with ChangeId, internal bug tracker numbers, not part of major OS
> > distributions.
> >
> > It is not open-source library and shows very clear why you chose
> > to upstream your driver through driver/misc/ tree.
> >
> > Thanks
>
> Adding Olof here.
>
> No, usually not.
> But are you kidding ?
> What did you exactly expect to find ? Is there an open-source project
> somewhere that encapsulates Deep-learning accelerators which I could
> connect to ?

I would expect certain level of code quality, collaboration and review
that distros require for inclusion. It is not the case for the github
repo you presented.

> AFAIK, the only thing remotely relevant is CUDA and that is
> closed-source (strange to hear lectures about open-source from NVIDIA
> people here...)

Please check git log statistics to estimate Nvidia/Mellanox/Cumulus
contributions to the Linux kernel and the open-source. You will be
surprised.

>
> So we are trying to give to the community such an open source library,
> or at least an example. Hopefully one day, when more companies
> upstream their drivers for deep-learning accelerators we could do
> something like libdrm or rdma-core, but for now, it's just our driver.

AFAIR, your driver is not unique, HiSilicon tried to submit something
similar years ago (warpdrive) and they are not alone.

>
> I have been in this community since 2013 with AMD and then RedHat, and
> I come with good intentions and a desire to open source and upstream
> as much as I can. I don't think I deserve this kind of response.

There is no need to take it personal. It was you who posted a link
to the github repo. What did you expect?

>
> The bottom line is that we had this discussion with Greg and Olof and
> DRM people almost 2 years ago and if there was some open-source
> project in user-space or some subsystem in the kernel we could connect
> to, we would have done that instead of what we did, but the fact of
> the matter there isn't such thing. Olof tried and is trying to create
> a h/w accelerator subsystem but it still hasn't got up from the ground
> yet.

Maybe it is a time to do it right.

>
> Oded

  reply	other threads:[~2020-09-18 13:10 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-15 17:10 [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 01/14] habanalabs/gaudi: add NIC H/W and registers definitions Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 02/14] habanalabs/gaudi: add NIC firmware-related definitions Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 03/14] habanalabs/gaudi: add NIC security configuration Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 04/14] habanalabs/gaudi: add support for NIC QMANs Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 05/14] habanalabs/gaudi: add NIC Ethernet support Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 06/14] habanalabs/gaudi: add NIC PHY code Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 07/14] habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 08/14] habanalabs/gaudi: add a new IOCTL for NIC control operations Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 09/14] habanalabs/gaudi: add CQ " Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 10/14] habanalabs/gaudi: add WQ " Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 11/14] habanalabs/gaudi: add QP error handling Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 12/14] habanalabs/gaudi: Add ethtool support using coresight Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 13/14] habanalabs/gaudi: support DCB protocol Oded Gabbay
2020-09-15 17:10 ` [PATCH v3 14/14] habanalabs/gaudi: add NIC init/fini calls from common code Oded Gabbay
2020-09-15 20:35 ` [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver Jakub Kicinski
2020-09-15 20:46   ` Oded Gabbay
2020-09-15 21:04     ` Jakub Kicinski
2020-09-15 21:20       ` Oded Gabbay
2020-09-15 21:37         ` Andrew Lunn
2020-09-15 21:43           ` Oded Gabbay
2020-09-15 22:35             ` David Miller
2020-09-15 22:36           ` David Miller
2020-09-15 22:34         ` David Miller
2020-09-16  4:26           ` Oded Gabbay
2020-09-17 17:18     ` Jason Gunthorpe
2020-09-18 11:36       ` Gal Pressman
2020-09-18 11:52         ` Leon Romanovsky
2020-09-18 11:56           ` Oded Gabbay
2020-09-18 12:03             ` Leon Romanovsky
2020-09-18 12:07               ` Oded Gabbay
2020-09-18 12:19                 ` Leon Romanovsky
2020-09-18 12:31                   ` Oded Gabbay
2020-09-18 13:09                     ` Leon Romanovsky [this message]
2020-09-19  6:40                   ` Greg Kroah-Hartman
2020-09-19  8:20                     ` Leon Romanovsky
2020-09-19  8:30                       ` Greg Kroah-Hartman
2020-09-19  8:58                         ` Leon Romanovsky
2020-09-19 16:43                         ` Oded Gabbay
2020-09-19 17:27                           ` Greg Kroah-Hartman
2020-09-19 19:22                             ` Jason Gunthorpe
2020-09-20  8:47                               ` Greg Kroah-Hartman
2020-09-20 19:05                                 ` Oded Gabbay
2020-09-21 10:39                                   ` Leon Romanovsky
2020-09-21 11:52                                 ` Jason Gunthorpe
2020-09-21 21:20                                   ` Jakub Kicinski
2020-09-22 11:49                                     ` Jason Gunthorpe
2020-09-19 18:49                           ` Andrew Lunn
2020-09-18 11:56         ` Jason Gunthorpe
2020-09-18 11:59           ` Oded Gabbay
2020-09-18 12:16             ` Jason Gunthorpe
2020-09-18 12:34               ` Oded Gabbay
2020-09-18 12:50                 ` Jason Gunthorpe
2020-09-18 13:02                   ` Oded Gabbay
2020-09-18 13:26                     ` Jason Gunthorpe
2020-09-18 13:49                       ` Oded Gabbay
2020-09-18 13:59                         ` Jason Gunthorpe
2020-09-18 14:12                           ` Oded Gabbay
2020-09-18 14:19                             ` Jason Gunthorpe
2020-09-18 14:45                               ` Oded Gabbay
2020-09-18 15:07                                 ` Jason Gunthorpe
2020-09-18 15:15                                   ` Oded Gabbay
2020-09-18 15:28                                     ` Jason Gunthorpe
2020-09-21 11:22                                       ` Gal Pressman
2020-09-21 11:49                                         ` Leon Romanovsky
2020-09-22 11:41                                         ` Jason Gunthorpe
2020-09-22 12:46                                           ` Gal Pressman
2020-09-22 16:14                                             ` Jason Gunthorpe
2020-09-22 16:30                                               ` Gal Pressman
2020-09-22 16:52                                                 ` Jason Gunthorpe
2020-09-18 12:10         ` Oded Gabbay
2020-09-15 20:42 ` David Miller
2020-09-15 20:49   ` Oded Gabbay
2020-09-16  6:26     ` Greg Kroah-Hartman
2020-09-16  6:36       ` Oded Gabbay
2020-09-16  7:42         ` Greg Kroah-Hartman
2020-09-16  8:02           ` Oded Gabbay
2020-09-16  8:22             ` Greg Kroah-Hartman
2020-09-16  8:47               ` Oded Gabbay
2020-09-16 12:00                 ` Greg Kroah-Hartman
2020-09-20 16:45                   ` Daniel Vetter
2020-09-16 23:04               ` Williams, Dan J
2020-09-18 12:00 ` Jason Gunthorpe
2020-09-18 12:01   ` Oded Gabbay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200918130955.GV869610@unreal \
    --to=leon@kernel.org \
    --cc=SW_Drivers@habana.ai \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=galpress@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=izur@habana.ai \
    --cc=jgg@ziepe.ca \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=oded.gabbay@gmail.com \
    --cc=olof@lixom.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).