All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Nikolay Aleksandrov <nikolay@enfabrica.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Shrijeet Mukherjee <shrijeet@enfabrica.net>,
	alex.badea@keysight.com, eric.davis@broadcom.com,
	rip.sohan@amd.com, David Ahern <dsahern@kernel.org>,
	bmt@zurich.ibm.com, roland@enfabrica.net,
	Winston Liu <winston.liu@keysight.com>,
	dan.mihailescu@keysight.com, kheib@redhat.com,
	parth.v.parikh@keysight.com, davem@redhat.com,
	ian.ziemba@hpe.com, andrew.tauferner@cornelisnetworks.com,
	welch@hpe.com, rakhahari.bhunia@keysight.com,
	kingshuk.mandal@keysight.com, linux-rdma@vger.kernel.org,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: Netlink vs ioctl WAS(Re: [RFC PATCH 00/13] Ultra Ethernet driver introduction
Date: Mon, 17 Mar 2025 14:57:51 +0200	[thread overview]
Message-ID: <20250317125751.GW1322339@unreal> (raw)
In-Reply-To: <CAM0EoMnJW7zJ2_DBm2geTpTnc5ZenNgvcXkLn1eXk4Tu0H0R+A@mail.gmail.com>

On Sat, Mar 15, 2025 at 04:49:20PM -0400, Jamal Hadi Salim wrote:
> On Wed, Mar 12, 2025 at 11:11 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Wed, Mar 12, 2025 at 04:20:08PM +0200, Nikolay Aleksandrov wrote:
> > > On 3/12/25 1:29 PM, Leon Romanovsky wrote:
> > > > On Wed, Mar 12, 2025 at 11:40:05AM +0200, Nikolay Aleksandrov wrote:
> > > >> On 3/8/25 8:46 PM, Leon Romanovsky wrote:
> > > >>> On Fri, Mar 07, 2025 at 01:01:50AM +0200, Nikolay Aleksandrov wrote:
> > > [snip]
> > > >> Also we have the ephemeral PDC connections>> that come and go as
> > > needed. There more such objects coming with more
> > > >> state, configuration and lifecycle management. That is why we added a
> > > >> separate netlink family to cleanly manage them without trying to fit
> > > >> a square peg in a round hole so to speak.
> > > >
> > > > Yeah, I saw that you are planning to use netlink to manage objects,
> > > > which is very questionable. It is slow, unreliable, requires sockets,
> > > > needs more parsing logic e.t.c
> 
> To chime in on the above re: netlink vs ioctl,
> [this is going to be a long message - over caffeinated and stuck on a trip....]
> 
> On "slow" - Mostly netlink can be deemed to "slow" for the following
> reasons 1) locks - which over the last year have been highly reduced
> 2) crossing user/kernel - which i believe is fixable with some mmap
> scheme (although past attempts at doing this have been unsuccessful)
> 3)async vs ioctl sync (more below)
> 
> On "unreliable": This is typically a result of some request response
> (or a subscribed to event) whose execution has failed to allocate
> memory in the kernel or overrun some buffers towards user space;
> however, any such failures are signalled to user space and can be
> recovered from.
> 
> ioctl is synchronous which gives it the "reliability" and "speed".
> iirc, if memory failure was to happen on ioctl it will block until it
> is successful? vs netlink which is async and will get signalled to
> user space if data is lost or cant be fully delivered. Example, if a
> user issued a dump of a very large amount of data from the kernel and
> that data wasnt fully delivered perhaps because of memory pressure,
> user space will be notified via socket errors and can use that info to
> recover.
> 
> Extensibility: ioctl take binary structs which make it much harder to
> extend but adds to that "speed". Once you pick your struct, you are
> stuck with it - as opposed to netlink which uses very extensible
> formally defined TLVs that makes it highly extensible. Yes,
> extensibility requires more parsing as you stated above. Note: if you
> have one-offs you could just hardcode a ioctl-like data structure into
> a TLV and use blocking netlink sockets and that should get you pretty
> close to ioctl "speed"
> 
> To build more on reliability: if you really cared, there are
> mechanisms which can be used to build a fully reliable mechanism of
> communication with the kernel since netlink is infact a wire protocol
> (which alas has been broken for a while because you cant really use it
> as a wire protocol across machines); see for example:
> https://datatracker.ietf.org/doc/html/rfc3549#section-2.3.2.1
> And if you dont really care about reliability you can just shoot
> messages into the kernel and turn off the ACK flag (and then issue
> requests when you feel you need to check on configuration).
> 
> Debuggability: extended ACKs(heavily used by networking) provide an
> excellent operational information user space in fine grained details
> on errors (famous EINVAL can tell you exactly what the EINVAL means
> for example).
> 
> netlink has a multicast publish-subscribe mechanism. Multicast being
> one-to-many means multi-user(important detail for both scaling and
> independent debugging) interface. Meaning you can have multiple
> processes subscribing to events that the kernel publishes. You dont
> have to resort to polling the kernel for details of dynamic changes
> (example "a new entry has been added to table foo" etc)
> As a matter of fact, original design  used to allow user space to
> advertise to both kernel and other user space apps (and unicast worked
> to/from kernel/user and user/user). I haent looked at that recently,
> so it could be broken.
> Note: while these events are also subject to message loss - netlink
> robustness described earlier is usable here as well (via socket
> errors).
> Example, if the kernel attempted to send an event which had the
> misfortune of not making it - user will be notified and can recover by
> requesting a related table dump, etc to see what changed..
> 
> - And as Nik mentioned: The new (yaml)model-to-generatedcode approach
> that is now common in generic netlink highly reduces developer effort.
> Although in my opinion we really need this stuff integrated into tools
> like iproute2..
> 
> I am pretty sure i left out some important details (maybe i can write
> a small doc when i am in better shape).

Thanks for such a detailed answer. I'm not against netlink, I'm against
netlink to configure complex HW objects.

Thanks

> 
> cheers,
> jamal
> 

  reply	other threads:[~2025-03-17 12:57 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06 23:01 [RFC PATCH 00/13] Ultra Ethernet driver introduction Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 01/13] drivers: ultraeth: add initial skeleton and kconfig option Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 02/13] drivers: ultraeth: add context support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 03/13] drivers: ultraeth: add new genl family Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 04/13] drivers: ultraeth: add job support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 05/13] drivers: ultraeth: add tunnel udp device support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 06/13] drivers: ultraeth: add initial PDS infrastructure Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 07/13] drivers: ultraeth: add request and ack receive support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 08/13] drivers: ultraeth: add request transmit support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 09/13] drivers: ultraeth: add support for coalescing ack Nikolay Aleksandrov
2025-03-06 23:02 ` [RFC PATCH 10/13] drivers: ultraeth: add sack support Nikolay Aleksandrov
2025-03-06 23:02 ` [RFC PATCH 11/13] drivers: ultraeth: add nack support Nikolay Aleksandrov
2025-03-06 23:02 ` [RFC PATCH 12/13] drivers: ultraeth: add initiator and target idle timeout support Nikolay Aleksandrov
2025-03-06 23:02 ` [RFC PATCH 13/13] HACK: drivers: ultraeth: add char device Nikolay Aleksandrov
2025-03-08 18:46 ` [RFC PATCH 00/13] Ultra Ethernet driver introduction Leon Romanovsky
2025-03-09  3:21   ` Parav Pandit
2025-03-11 14:20     ` Bernard Metzler
2025-03-11 14:55       ` Leon Romanovsky
2025-03-11 17:11       ` Sean Hefty
2025-03-12  9:20         ` Nikolay Aleksandrov
2025-03-12  9:40   ` Nikolay Aleksandrov
2025-03-12 11:29     ` Leon Romanovsky
2025-03-12 14:20       ` Nikolay Aleksandrov
2025-03-12 15:10         ` Leon Romanovsky
2025-03-12 16:00           ` Nikolay Aleksandrov
2025-03-14 14:53           ` Bernard Metzler
2025-03-17 12:52             ` Leon Romanovsky
2025-03-19 13:52             ` Jason Gunthorpe
2025-03-19 14:02               ` Nikolay Aleksandrov
2025-03-14 20:51           ` Stanislav Fomichev
2025-03-17 12:30             ` Leon Romanovsky
2025-03-19 19:12               ` Stanislav Fomichev
2025-03-15 20:49           ` Netlink vs ioctl WAS(Re: " Jamal Hadi Salim
2025-03-17 12:57             ` Leon Romanovsky [this message]
2025-03-18 22:49             ` Jason Gunthorpe
2025-03-19 18:21               ` Jamal Hadi Salim
2025-03-19 19:19                 ` Jason Gunthorpe
2025-03-25 14:12                   ` Jamal Hadi Salim
2025-03-26 15:50                     ` Jason Gunthorpe
2025-04-08 14:16                       ` Jamal Hadi Salim
2025-04-09 16:10                         ` Jason Gunthorpe
2025-03-19 16:48 ` Jason Gunthorpe
2025-03-20 11:13   ` Yunsheng Lin
2025-03-20 14:32     ` Jason Gunthorpe
2025-03-20 20:05       ` Sean Hefty
2025-03-20 20:12         ` Jason Gunthorpe
2025-03-21  2:02           ` Yunsheng Lin
2025-03-21 12:01             ` Jason Gunthorpe
2025-03-24 20:22   ` Roland Dreier
2025-03-24 21:28     ` Sean Hefty
2025-03-25 13:22       ` Bernard Metzler
2025-03-25 17:02         ` Sean Hefty
2025-03-26 14:45           ` Jason Gunthorpe
2025-03-26 15:29             ` Sean Hefty
2025-03-26 15:53               ` Jason Gunthorpe
2025-03-26 17:39                 ` Sean Hefty
2025-03-27 13:26                   ` Jason Gunthorpe
2025-03-28 12:20                     ` Yunsheng Lin
2025-03-31 19:49                       ` Sean Hefty
2025-04-01  9:19                         ` Yunsheng Lin
2025-03-31 19:29                     ` Sean Hefty
2025-04-01 13:04                       ` Jason Gunthorpe
2025-04-01 16:57                         ` Sean Hefty
2025-04-01 19:39                           ` Jason Gunthorpe
2025-04-03  1:30                             ` Sean Hefty
2025-04-04 16:03                             ` Ziemba, Ian
2025-04-05  1:07                               ` Sean Hefty
2025-04-07 19:32                                 ` Ziemba, Ian
2025-04-08  4:40                                   ` Sean Hefty
2025-04-16 23:58                                   ` Sean Hefty
2025-04-17  1:23                                     ` Jason Gunthorpe
2025-04-17  2:59                                       ` Sean Hefty
2025-04-17 13:31                                         ` Jason Gunthorpe
2025-04-18 16:50                                           ` Sean Hefty
2025-04-22 15:44                                             ` Jason Gunthorpe
2025-03-26 15:16     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250317125751.GW1322339@unreal \
    --to=leon@kernel.org \
    --cc=alex.badea@keysight.com \
    --cc=andrew.tauferner@cornelisnetworks.com \
    --cc=bmt@zurich.ibm.com \
    --cc=dan.mihailescu@keysight.com \
    --cc=davem@redhat.com \
    --cc=dsahern@kernel.org \
    --cc=eric.davis@broadcom.com \
    --cc=ian.ziemba@hpe.com \
    --cc=jgg@nvidia.com \
    --cc=jhs@mojatatu.com \
    --cc=kheib@redhat.com \
    --cc=kingshuk.mandal@keysight.com \
    --cc=kuba@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@enfabrica.net \
    --cc=pabeni@redhat.com \
    --cc=parth.v.parikh@keysight.com \
    --cc=rakhahari.bhunia@keysight.com \
    --cc=rip.sohan@amd.com \
    --cc=roland@enfabrica.net \
    --cc=shrijeet@enfabrica.net \
    --cc=welch@hpe.com \
    --cc=winston.liu@keysight.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.