netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Nikolay Aleksandrov <nikolay@enfabrica.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Shrijeet Mukherjee <shrijeet@enfabrica.net>,
	alex.badea@keysight.com, eric.davis@broadcom.com,
	rip.sohan@amd.com, David Ahern <dsahern@kernel.org>,
	bmt@zurich.ibm.com, roland@enfabrica.net,
	Winston Liu <winston.liu@keysight.com>,
	dan.mihailescu@keysight.com, kheib@redhat.com,
	parth.v.parikh@keysight.com, davem@redhat.com,
	ian.ziemba@hpe.com, andrew.tauferner@cornelisnetworks.com,
	welch@hpe.com, rakhahari.bhunia@keysight.com,
	kingshuk.mandal@keysight.com, linux-rdma@vger.kernel.org,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: Netlink vs ioctl WAS(Re: [RFC PATCH 00/13] Ultra Ethernet driver introduction
Date: Mon, 17 Mar 2025 14:57:51 +0200	[thread overview]
Message-ID: <20250317125751.GW1322339@unreal> (raw)
In-Reply-To: <CAM0EoMnJW7zJ2_DBm2geTpTnc5ZenNgvcXkLn1eXk4Tu0H0R+A@mail.gmail.com>

On Sat, Mar 15, 2025 at 04:49:20PM -0400, Jamal Hadi Salim wrote:
> On Wed, Mar 12, 2025 at 11:11 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Wed, Mar 12, 2025 at 04:20:08PM +0200, Nikolay Aleksandrov wrote:
> > > On 3/12/25 1:29 PM, Leon Romanovsky wrote:
> > > > On Wed, Mar 12, 2025 at 11:40:05AM +0200, Nikolay Aleksandrov wrote:
> > > >> On 3/8/25 8:46 PM, Leon Romanovsky wrote:
> > > >>> On Fri, Mar 07, 2025 at 01:01:50AM +0200, Nikolay Aleksandrov wrote:
> > > [snip]
> > > >> Also we have the ephemeral PDC connections>> that come and go as
> > > needed. There more such objects coming with more
> > > >> state, configuration and lifecycle management. That is why we added a
> > > >> separate netlink family to cleanly manage them without trying to fit
> > > >> a square peg in a round hole so to speak.
> > > >
> > > > Yeah, I saw that you are planning to use netlink to manage objects,
> > > > which is very questionable. It is slow, unreliable, requires sockets,
> > > > needs more parsing logic e.t.c
> 
> To chime in on the above re: netlink vs ioctl,
> [this is going to be a long message - over caffeinated and stuck on a trip....]
> 
> On "slow" - Mostly netlink can be deemed to "slow" for the following
> reasons 1) locks - which over the last year have been highly reduced
> 2) crossing user/kernel - which i believe is fixable with some mmap
> scheme (although past attempts at doing this have been unsuccessful)
> 3)async vs ioctl sync (more below)
> 
> On "unreliable": This is typically a result of some request response
> (or a subscribed to event) whose execution has failed to allocate
> memory in the kernel or overrun some buffers towards user space;
> however, any such failures are signalled to user space and can be
> recovered from.
> 
> ioctl is synchronous which gives it the "reliability" and "speed".
> iirc, if memory failure was to happen on ioctl it will block until it
> is successful? vs netlink which is async and will get signalled to
> user space if data is lost or cant be fully delivered. Example, if a
> user issued a dump of a very large amount of data from the kernel and
> that data wasnt fully delivered perhaps because of memory pressure,
> user space will be notified via socket errors and can use that info to
> recover.
> 
> Extensibility: ioctl take binary structs which make it much harder to
> extend but adds to that "speed". Once you pick your struct, you are
> stuck with it - as opposed to netlink which uses very extensible
> formally defined TLVs that makes it highly extensible. Yes,
> extensibility requires more parsing as you stated above. Note: if you
> have one-offs you could just hardcode a ioctl-like data structure into
> a TLV and use blocking netlink sockets and that should get you pretty
> close to ioctl "speed"
> 
> To build more on reliability: if you really cared, there are
> mechanisms which can be used to build a fully reliable mechanism of
> communication with the kernel since netlink is infact a wire protocol
> (which alas has been broken for a while because you cant really use it
> as a wire protocol across machines); see for example:
> https://datatracker.ietf.org/doc/html/rfc3549#section-2.3.2.1
> And if you dont really care about reliability you can just shoot
> messages into the kernel and turn off the ACK flag (and then issue
> requests when you feel you need to check on configuration).
> 
> Debuggability: extended ACKs(heavily used by networking) provide an
> excellent operational information user space in fine grained details
> on errors (famous EINVAL can tell you exactly what the EINVAL means
> for example).
> 
> netlink has a multicast publish-subscribe mechanism. Multicast being
> one-to-many means multi-user(important detail for both scaling and
> independent debugging) interface. Meaning you can have multiple
> processes subscribing to events that the kernel publishes. You dont
> have to resort to polling the kernel for details of dynamic changes
> (example "a new entry has been added to table foo" etc)
> As a matter of fact, original design  used to allow user space to
> advertise to both kernel and other user space apps (and unicast worked
> to/from kernel/user and user/user). I haent looked at that recently,
> so it could be broken.
> Note: while these events are also subject to message loss - netlink
> robustness described earlier is usable here as well (via socket
> errors).
> Example, if the kernel attempted to send an event which had the
> misfortune of not making it - user will be notified and can recover by
> requesting a related table dump, etc to see what changed..
> 
> - And as Nik mentioned: The new (yaml)model-to-generatedcode approach
> that is now common in generic netlink highly reduces developer effort.
> Although in my opinion we really need this stuff integrated into tools
> like iproute2..
> 
> I am pretty sure i left out some important details (maybe i can write
> a small doc when i am in better shape).

Thanks for such a detailed answer. I'm not against netlink, I'm against
netlink to configure complex HW objects.

Thanks

> 
> cheers,
> jamal
> 

  reply	other threads:[~2025-03-17 12:57 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06 23:01 [RFC PATCH 00/13] Ultra Ethernet driver introduction Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 01/13] drivers: ultraeth: add initial skeleton and kconfig option Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 02/13] drivers: ultraeth: add context support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 03/13] drivers: ultraeth: add new genl family Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 04/13] drivers: ultraeth: add job support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 05/13] drivers: ultraeth: add tunnel udp device support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 06/13] drivers: ultraeth: add initial PDS infrastructure Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 07/13] drivers: ultraeth: add request and ack receive support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 08/13] drivers: ultraeth: add request transmit support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 09/13] drivers: ultraeth: add support for coalescing ack Nikolay Aleksandrov
2025-03-06 23:02 ` [RFC PATCH 10/13] drivers: ultraeth: add sack support Nikolay Aleksandrov
2025-03-06 23:02 ` [RFC PATCH 11/13] drivers: ultraeth: add nack support Nikolay Aleksandrov
2025-03-06 23:02 ` [RFC PATCH 12/13] drivers: ultraeth: add initiator and target idle timeout support Nikolay Aleksandrov
2025-03-06 23:02 ` [RFC PATCH 13/13] HACK: drivers: ultraeth: add char device Nikolay Aleksandrov
2025-03-08 18:46 ` [RFC PATCH 00/13] Ultra Ethernet driver introduction Leon Romanovsky
2025-03-09  3:21   ` Parav Pandit
2025-03-11 14:20     ` Bernard Metzler
2025-03-11 14:55       ` Leon Romanovsky
2025-03-11 17:11       ` Sean Hefty
2025-03-12  9:20         ` Nikolay Aleksandrov
2025-03-12  9:40   ` Nikolay Aleksandrov
2025-03-12 11:29     ` Leon Romanovsky
2025-03-12 14:20       ` Nikolay Aleksandrov
2025-03-12 15:10         ` Leon Romanovsky
2025-03-12 16:00           ` Nikolay Aleksandrov
2025-03-14 14:53           ` Bernard Metzler
2025-03-17 12:52             ` Leon Romanovsky
2025-03-19 13:52             ` Jason Gunthorpe
2025-03-19 14:02               ` Nikolay Aleksandrov
2025-03-14 20:51           ` Stanislav Fomichev
2025-03-17 12:30             ` Leon Romanovsky
2025-03-19 19:12               ` Stanislav Fomichev
2025-03-15 20:49           ` Netlink vs ioctl WAS(Re: " Jamal Hadi Salim
2025-03-17 12:57             ` Leon Romanovsky [this message]
2025-03-18 22:49             ` Jason Gunthorpe
2025-03-19 18:21               ` Jamal Hadi Salim
2025-03-19 19:19                 ` Jason Gunthorpe
2025-03-25 14:12                   ` Jamal Hadi Salim
2025-03-26 15:50                     ` Jason Gunthorpe
2025-04-08 14:16                       ` Jamal Hadi Salim
2025-04-09 16:10                         ` Jason Gunthorpe
2025-03-19 16:48 ` Jason Gunthorpe
2025-03-20 11:13   ` Yunsheng Lin
2025-03-20 14:32     ` Jason Gunthorpe
2025-03-20 20:05       ` Sean Hefty
2025-03-20 20:12         ` Jason Gunthorpe
2025-03-21  2:02           ` Yunsheng Lin
2025-03-21 12:01             ` Jason Gunthorpe
2025-03-24 20:22   ` Roland Dreier
2025-03-24 21:28     ` Sean Hefty
2025-03-25 13:22       ` Bernard Metzler
2025-03-25 17:02         ` Sean Hefty
2025-03-26 14:45           ` Jason Gunthorpe
2025-03-26 15:29             ` Sean Hefty
2025-03-26 15:53               ` Jason Gunthorpe
2025-03-26 17:39                 ` Sean Hefty
2025-03-27 13:26                   ` Jason Gunthorpe
2025-03-28 12:20                     ` Yunsheng Lin
2025-03-31 19:49                       ` Sean Hefty
2025-04-01  9:19                         ` Yunsheng Lin
2025-03-31 19:29                     ` Sean Hefty
2025-04-01 13:04                       ` Jason Gunthorpe
2025-04-01 16:57                         ` Sean Hefty
2025-04-01 19:39                           ` Jason Gunthorpe
2025-04-03  1:30                             ` Sean Hefty
2025-04-04 16:03                             ` Ziemba, Ian
2025-04-05  1:07                               ` Sean Hefty
2025-04-07 19:32                                 ` Ziemba, Ian
2025-04-08  4:40                                   ` Sean Hefty
2025-04-16 23:58                                   ` Sean Hefty
2025-04-17  1:23                                     ` Jason Gunthorpe
2025-04-17  2:59                                       ` Sean Hefty
2025-04-17 13:31                                         ` Jason Gunthorpe
2025-04-18 16:50                                           ` Sean Hefty
2025-04-22 15:44                                             ` Jason Gunthorpe
2025-03-26 15:16     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250317125751.GW1322339@unreal \
    --to=leon@kernel.org \
    --cc=alex.badea@keysight.com \
    --cc=andrew.tauferner@cornelisnetworks.com \
    --cc=bmt@zurich.ibm.com \
    --cc=dan.mihailescu@keysight.com \
    --cc=davem@redhat.com \
    --cc=dsahern@kernel.org \
    --cc=eric.davis@broadcom.com \
    --cc=ian.ziemba@hpe.com \
    --cc=jgg@nvidia.com \
    --cc=jhs@mojatatu.com \
    --cc=kheib@redhat.com \
    --cc=kingshuk.mandal@keysight.com \
    --cc=kuba@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@enfabrica.net \
    --cc=pabeni@redhat.com \
    --cc=parth.v.parikh@keysight.com \
    --cc=rakhahari.bhunia@keysight.com \
    --cc=rip.sohan@amd.com \
    --cc=roland@enfabrica.net \
    --cc=shrijeet@enfabrica.net \
    --cc=welch@hpe.com \
    --cc=winston.liu@keysight.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).