* Re: mlx5 ConnectX diagnostic misc driver
@ 2023-10-19 15:24 Jakub Kicinski
2023-10-19 15:36 ` Greg Kroah-Hartman
2023-10-19 16:10 ` Saeed Mahameed
0 siblings, 2 replies; 8+ messages in thread
From: Jakub Kicinski @ 2023-10-19 15:24 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Greg Kroah-Hartman, Leon Romanovsky, Jason Gunthorpe, Jiri Pirko,
linux-kernel, Arnd Bergmann
> The ConnectX HW family supported by the mlx5 drivers uses an architecture
> where a FW component executes "mailbox RPCs" issued by the driver to make
> changes to the device. This results in a complex debugging environment
> where the FW component has information and complex low level state that
> needs to be accessed to userspace for debugging purposes.
You're being very dishonest towards Greg by not telling him that this
is a networking device, and the networking maintainers explicitly nacked
this backdoor. Nacked it, because you can't answer basic questions like
"what are the use cases" with more than "custom config and debug".
Whether Greg wants to take this into the "misc" pile is entirely up
to him, but you gotta give him more context.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: mlx5 ConnectX diagnostic misc driver 2023-10-19 15:24 mlx5 ConnectX diagnostic misc driver Jakub Kicinski @ 2023-10-19 15:36 ` Greg Kroah-Hartman 2023-10-19 16:01 ` Jason Gunthorpe 2023-10-19 17:11 ` Leon Romanovsky 2023-10-19 16:10 ` Saeed Mahameed 1 sibling, 2 replies; 8+ messages in thread From: Greg Kroah-Hartman @ 2023-10-19 15:36 UTC (permalink / raw) To: Jakub Kicinski Cc: Saeed Mahameed, Leon Romanovsky, Jason Gunthorpe, Jiri Pirko, linux-kernel, Arnd Bergmann On Thu, Oct 19, 2023 at 08:24:51AM -0700, Jakub Kicinski wrote: > > The ConnectX HW family supported by the mlx5 drivers uses an architecture > > where a FW component executes "mailbox RPCs" issued by the driver to make > > changes to the device. This results in a complex debugging environment > > where the FW component has information and complex low level state that > > needs to be accessed to userspace for debugging purposes. > > You're being very dishonest towards Greg by not telling him that this > is a networking device, and the networking maintainers explicitly nacked > this backdoor. Nacked it, because you can't answer basic questions like > "what are the use cases" with more than "custom config and debug". > > Whether Greg wants to take this into the "misc" pile is entirely up > to him, but you gotta give him more context. Well, in this case, no way in hell will I be taking this. If this is a networking device, it needs to go through the normal networking driver review process, thanks for the heads up. greg k-h ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlx5 ConnectX diagnostic misc driver 2023-10-19 15:36 ` Greg Kroah-Hartman @ 2023-10-19 16:01 ` Jason Gunthorpe 2023-10-19 16:23 ` Jakub Kicinski 2023-10-19 17:11 ` Leon Romanovsky 1 sibling, 1 reply; 8+ messages in thread From: Jason Gunthorpe @ 2023-10-19 16:01 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jakub Kicinski, Saeed Mahameed, Leon Romanovsky, Jiri Pirko, linux-kernel, Arnd Bergmann On Thu, Oct 19, 2023 at 05:36:04PM +0200, Greg Kroah-Hartman wrote: > On Thu, Oct 19, 2023 at 08:24:51AM -0700, Jakub Kicinski wrote: > > > The ConnectX HW family supported by the mlx5 drivers uses an architecture > > > where a FW component executes "mailbox RPCs" issued by the driver to make > > > changes to the device. This results in a complex debugging environment > > > where the FW component has information and complex low level state that > > > needs to be accessed to userspace for debugging purposes. > > > > You're being very dishonest towards Greg by not telling him that this > > is a networking device, and the networking maintainers explicitly nacked > > this backdoor. Do you have a lore link? > Well, in this case, no way in hell will I be taking this. If this is a > networking device, it needs to go through the normal networking driver > review process, thanks for the heads up. It is not just a networking device. mlx5 is a giant and complex multi-subsystem piece of hardware. This is shared debugging and configuration interface to the device FW that interacts across all of the different subsystems the driver supports. Looking at Saeed's tool capability on his github it is significantly, but not exclusively supporting RDMA (ie drivers/infiniband), with some features for the mlx5 VFIO drivers, mlx5 VDPA and a bunch of lowlevel PCI stuff too. Calling it a "networking device" in the sense of "it is owned only be netdev" is not accurate. We think misc is an appropriate place to put something like this, there are many other misc drivers that are sort of similar APIs to access embedded FW to manage and debug it. Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlx5 ConnectX diagnostic misc driver 2023-10-19 16:01 ` Jason Gunthorpe @ 2023-10-19 16:23 ` Jakub Kicinski 2023-10-19 16:50 ` Jason Gunthorpe 2023-10-19 17:59 ` Saeed Mahameed 0 siblings, 2 replies; 8+ messages in thread From: Jakub Kicinski @ 2023-10-19 16:23 UTC (permalink / raw) To: Jason Gunthorpe Cc: Greg Kroah-Hartman, Saeed Mahameed, Leon Romanovsky, Jiri Pirko, linux-kernel, Arnd Bergmann On Thu, 19 Oct 2023 13:01:45 -0300 Jason Gunthorpe wrote: > Do you have a lore link? No, it was pitched at conferences: Last year's LPC: https://www.youtube.com/watch?v=JGR9ZCeiW-E This year's netconf / KR, but Saeed didn't have slides: https://netdev.bots.linux.dev/netconf/2023/index.html Really, you should be asking Saeed this, not me. > Looking at Saeed's tool capability on his github it is significantly, > but not exclusively supporting RDMA (ie drivers/infiniband), with some > features for the mlx5 VFIO drivers, mlx5 VDPA and a bunch of lowlevel > PCI stuff too. > > Calling it a "networking device" in the sense of "it is owned only be > netdev" is not accurate. Yes, let's now have a pointless augment about when a NIC is a NIC and when it's no longer a NIC because it has offloads. My point is Saeed pitched this to networking maintainers *twice* and then purposefully left us out of the CC list. That is absolutely unacceptable IMO, but I'd like to consult with others to make sure it's not just me. Please allow me some time to do that. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlx5 ConnectX diagnostic misc driver 2023-10-19 16:23 ` Jakub Kicinski @ 2023-10-19 16:50 ` Jason Gunthorpe 2023-10-19 17:59 ` Saeed Mahameed 1 sibling, 0 replies; 8+ messages in thread From: Jason Gunthorpe @ 2023-10-19 16:50 UTC (permalink / raw) To: Jakub Kicinski Cc: Greg Kroah-Hartman, Saeed Mahameed, Leon Romanovsky, Jiri Pirko, linux-kernel, Arnd Bergmann On Thu, Oct 19, 2023 at 09:23:46AM -0700, Jakub Kicinski wrote: > On Thu, 19 Oct 2023 13:01:45 -0300 Jason Gunthorpe wrote: > > Do you have a lore link? > > No, it was pitched at conferences: > > Last year's LPC: https://www.youtube.com/watch?v=JGR9ZCeiW-E > This year's netconf / KR, but Saeed didn't have slides: > https://netdev.bots.linux.dev/netconf/2023/index.html > > Really, you should be asking Saeed this, not me. You brought it up. I was at that LPC talk, and you made it clear not to even propose using devlink. Seems like that is what Saeed did to me. > > Looking at Saeed's tool capability on his github it is significantly, > > but not exclusively supporting RDMA (ie drivers/infiniband), with some > > features for the mlx5 VFIO drivers, mlx5 VDPA and a bunch of lowlevel > > PCI stuff too. > > > > Calling it a "networking device" in the sense of "it is owned only be > > netdev" is not accurate. > > Yes, let's now have a pointless augment about when a NIC is a NIC > and when it's no longer a NIC because it has offloads. This is not a pointless argument. How mlx5 hardware works in the many subsystems it works with is NOT your exclusive decision to dictate just because it has a shared networking port on the back of the card. Lets focus on a community conensus approach between all the subsystems please. If you have a better suggestion how to address this cross-subsystem need then please share it. > My point is Saeed pitched this to networking maintainers *twice* I read that as for a devlink version, this is clearly not devlink. > My point is Saeed pitched this to networking maintainers *twice* > and then purposefully left us out of the CC list. > That is absolutely unacceptable IMO, but I'd like to consult with > others to make sure it's not just me. Please allow me some time > to do that. It is standard practice not to CC'd netdev for mlx5 related work outside the scope of the netdev driver. Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlx5 ConnectX diagnostic misc driver 2023-10-19 16:23 ` Jakub Kicinski 2023-10-19 16:50 ` Jason Gunthorpe @ 2023-10-19 17:59 ` Saeed Mahameed 1 sibling, 0 replies; 8+ messages in thread From: Saeed Mahameed @ 2023-10-19 17:59 UTC (permalink / raw) To: Jakub Kicinski Cc: Jason Gunthorpe, Greg Kroah-Hartman, Leon Romanovsky, Jiri Pirko, linux-kernel, Arnd Bergmann On 19 Oct 09:23, Jakub Kicinski wrote: >On Thu, 19 Oct 2023 13:01:45 -0300 Jason Gunthorpe wrote: >> Do you have a lore link? > >No, it was pitched at conferences: > >Last year's LPC: https://www.youtube.com/watch?v=JGR9ZCeiW-E Yes this was one of the problems I discussed at LPC, please keep in mind that I raised multiple other unrelated problems too. In this project we discuss the debug-ability features only, The orchestration features are a separate project that is still on-going work to be under devlink. We all agreed that devlink isn't the place for such vast device debug options, hence this is our proposal, a separate aux device that works in unison with other aux devices such as virtio, vpda,rdma,netdev, etc .. >This year's netconf / KR, but Saeed didn't have slides: >https://netdev.bots.linux.dev/netconf/2023/index.html > At netconf/KR I only discussed the orchestration project we are working on to improve an going work we are doing, multi netdev interface, such as the socket direct issue we discussed, this has nothing to do with this project proposal which cover RDMA,VDPA,netdev,virtio,Blue-Field, etc .. ConnectX debug-ability and diagnosis problem. >Really, you should be asking Saeed this, not me. > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlx5 ConnectX diagnostic misc driver 2023-10-19 15:36 ` Greg Kroah-Hartman 2023-10-19 16:01 ` Jason Gunthorpe @ 2023-10-19 17:11 ` Leon Romanovsky 1 sibling, 0 replies; 8+ messages in thread From: Leon Romanovsky @ 2023-10-19 17:11 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jakub Kicinski, Saeed Mahameed, Jason Gunthorpe, Jiri Pirko, linux-kernel, Arnd Bergmann On Thu, Oct 19, 2023 at 05:36:04PM +0200, Greg Kroah-Hartman wrote: > On Thu, Oct 19, 2023 at 08:24:51AM -0700, Jakub Kicinski wrote: > > > The ConnectX HW family supported by the mlx5 drivers uses an architecture > > > where a FW component executes "mailbox RPCs" issued by the driver to make > > > changes to the device. This results in a complex debugging environment > > > where the FW component has information and complex low level state that > > > needs to be accessed to userspace for debugging purposes. > > > > You're being very dishonest towards Greg by not telling him that this > > is a networking device, and the networking maintainers explicitly nacked > > this backdoor. Nacked it, because you can't answer basic questions like > > "what are the use cases" with more than "custom config and debug". > > > > Whether Greg wants to take this into the "misc" pile is entirely up > > to him, but you gotta give him more context. > > Well, in this case, no way in hell will I be taking this. If this is a > networking device, It is not networking device. Only by coincidence, the PCI core of mlx5 driver is located in netdev, and it serves other subsystems as well (VFIO, VDPA, RDMA, e.t.c.). Just as a reminder, we added auxiliary bus to make sure that this device is properly decoupled across various subsystems. > it needs to go through the normal networking driver review process, > thanks for the heads up. That process is not relevant here, not everything in the world is netdev. Thanks > > greg k-h ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: mlx5 ConnectX diagnostic misc driver 2023-10-19 15:24 mlx5 ConnectX diagnostic misc driver Jakub Kicinski 2023-10-19 15:36 ` Greg Kroah-Hartman @ 2023-10-19 16:10 ` Saeed Mahameed 1 sibling, 0 replies; 8+ messages in thread From: Saeed Mahameed @ 2023-10-19 16:10 UTC (permalink / raw) To: Jakub Kicinski Cc: Greg Kroah-Hartman, Leon Romanovsky, Jason Gunthorpe, Jiri Pirko, linux-kernel, Arnd Bergmann On 19 Oct 08:24, Jakub Kicinski wrote: >> The ConnectX HW family supported by the mlx5 drivers uses an architecture >> where a FW component executes "mailbox RPCs" issued by the driver to make >> changes to the device. This results in a complex debugging environment >> where the FW component has information and complex low level state that >> needs to be accessed to userspace for debugging purposes. > >You're being very dishonest towards Greg by not telling him that this >is a networking device, and the networking maintainers explicitly nacked This is not a netwroking only device, as described in the tools documentation quoting: "ConnectX devices are complex and provide a vast set of features and components (SmartNiCs SoCs, Multi-protocol Network Adapters with Ethernent, Infiniband, Storage, DPU, and many acceleration and offload capabilities). This project will provide unified tool set to access, debug, diagnose and monitor those devices using the ConnectX architecture and onboard processors and firmware." mlx5 has multiple drivers in multiple subsystems, it's not only networking or netdev, a huge part of mlx5 is the mlx5 RDMA driver another is mlx5 vdpa driver, and the list goes on, including virtio/vfio. ConnectX supports multiple types of PF/VFs/SFs and at least a dozen of ULPs and aux devices. A unified diag driver that provideslow level understanding of the ConnectX architecture is the best way to go here, so it can serve everyone using a ConnectX device, regardless of the subsystem it is being used for.. >this backdoor. Nacked it, because you can't answer basic questions like >"what are the use cases" with more than "custom config and debug". > I think the tools project provides a clear picture on what the uses cases are, it is already part of this cover-letter: Please see the documentation, and let me know if you have other questions: https://github.com/saeedtx/mlx5ctl >Whether Greg wants to take this into the "misc" pile is entirely up >to him, but you gotta give him more context. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-10-19 17:59 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-19 15:24 mlx5 ConnectX diagnostic misc driver Jakub Kicinski 2023-10-19 15:36 ` Greg Kroah-Hartman 2023-10-19 16:01 ` Jason Gunthorpe 2023-10-19 16:23 ` Jakub Kicinski 2023-10-19 16:50 ` Jason Gunthorpe 2023-10-19 17:59 ` Saeed Mahameed 2023-10-19 17:11 ` Leon Romanovsky 2023-10-19 16:10 ` Saeed Mahameed
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox