From: Jamal Hadi Salim <jhs@mojatatu.com>
To: Leon Romanovsky <leon@kernel.org>
Cc: Nikolay Aleksandrov <nikolay@enfabrica.net>,
Linux Kernel Network Developers <netdev@vger.kernel.org>,
Shrijeet Mukherjee <shrijeet@enfabrica.net>,
alex.badea@keysight.com, eric.davis@broadcom.com,
rip.sohan@amd.com, David Ahern <dsahern@kernel.org>,
bmt@zurich.ibm.com, roland@enfabrica.net,
Winston Liu <winston.liu@keysight.com>,
dan.mihailescu@keysight.com, kheib@redhat.com,
parth.v.parikh@keysight.com, davem@redhat.com,
ian.ziemba@hpe.com, andrew.tauferner@cornelisnetworks.com,
welch@hpe.com, rakhahari.bhunia@keysight.com,
kingshuk.mandal@keysight.com, linux-rdma@vger.kernel.org,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Jason Gunthorpe <jgg@nvidia.com>
Subject: Netlink vs ioctl WAS(Re: [RFC PATCH 00/13] Ultra Ethernet driver introduction
Date: Sat, 15 Mar 2025 16:49:20 -0400 [thread overview]
Message-ID: <CAM0EoMnJW7zJ2_DBm2geTpTnc5ZenNgvcXkLn1eXk4Tu0H0R+A@mail.gmail.com> (raw)
In-Reply-To: <20250312151037.GE1322339@unreal>
On Wed, Mar 12, 2025 at 11:11 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Wed, Mar 12, 2025 at 04:20:08PM +0200, Nikolay Aleksandrov wrote:
> > On 3/12/25 1:29 PM, Leon Romanovsky wrote:
> > > On Wed, Mar 12, 2025 at 11:40:05AM +0200, Nikolay Aleksandrov wrote:
> > >> On 3/8/25 8:46 PM, Leon Romanovsky wrote:
> > >>> On Fri, Mar 07, 2025 at 01:01:50AM +0200, Nikolay Aleksandrov wrote:
> > [snip]
> > >> Also we have the ephemeral PDC connections>> that come and go as
> > needed. There more such objects coming with more
> > >> state, configuration and lifecycle management. That is why we added a
> > >> separate netlink family to cleanly manage them without trying to fit
> > >> a square peg in a round hole so to speak.
> > >
> > > Yeah, I saw that you are planning to use netlink to manage objects,
> > > which is very questionable. It is slow, unreliable, requires sockets,
> > > needs more parsing logic e.t.c
To chime in on the above re: netlink vs ioctl,
[this is going to be a long message - over caffeinated and stuck on a trip....]
On "slow" - Mostly netlink can be deemed to "slow" for the following
reasons 1) locks - which over the last year have been highly reduced
2) crossing user/kernel - which i believe is fixable with some mmap
scheme (although past attempts at doing this have been unsuccessful)
3)async vs ioctl sync (more below)
On "unreliable": This is typically a result of some request response
(or a subscribed to event) whose execution has failed to allocate
memory in the kernel or overrun some buffers towards user space;
however, any such failures are signalled to user space and can be
recovered from.
ioctl is synchronous which gives it the "reliability" and "speed".
iirc, if memory failure was to happen on ioctl it will block until it
is successful? vs netlink which is async and will get signalled to
user space if data is lost or cant be fully delivered. Example, if a
user issued a dump of a very large amount of data from the kernel and
that data wasnt fully delivered perhaps because of memory pressure,
user space will be notified via socket errors and can use that info to
recover.
Extensibility: ioctl take binary structs which make it much harder to
extend but adds to that "speed". Once you pick your struct, you are
stuck with it - as opposed to netlink which uses very extensible
formally defined TLVs that makes it highly extensible. Yes,
extensibility requires more parsing as you stated above. Note: if you
have one-offs you could just hardcode a ioctl-like data structure into
a TLV and use blocking netlink sockets and that should get you pretty
close to ioctl "speed"
To build more on reliability: if you really cared, there are
mechanisms which can be used to build a fully reliable mechanism of
communication with the kernel since netlink is infact a wire protocol
(which alas has been broken for a while because you cant really use it
as a wire protocol across machines); see for example:
https://datatracker.ietf.org/doc/html/rfc3549#section-2.3.2.1
And if you dont really care about reliability you can just shoot
messages into the kernel and turn off the ACK flag (and then issue
requests when you feel you need to check on configuration).
Debuggability: extended ACKs(heavily used by networking) provide an
excellent operational information user space in fine grained details
on errors (famous EINVAL can tell you exactly what the EINVAL means
for example).
netlink has a multicast publish-subscribe mechanism. Multicast being
one-to-many means multi-user(important detail for both scaling and
independent debugging) interface. Meaning you can have multiple
processes subscribing to events that the kernel publishes. You dont
have to resort to polling the kernel for details of dynamic changes
(example "a new entry has been added to table foo" etc)
As a matter of fact, original design used to allow user space to
advertise to both kernel and other user space apps (and unicast worked
to/from kernel/user and user/user). I haent looked at that recently,
so it could be broken.
Note: while these events are also subject to message loss - netlink
robustness described earlier is usable here as well (via socket
errors).
Example, if the kernel attempted to send an event which had the
misfortune of not making it - user will be notified and can recover by
requesting a related table dump, etc to see what changed..
- And as Nik mentioned: The new (yaml)model-to-generatedcode approach
that is now common in generic netlink highly reduces developer effort.
Although in my opinion we really need this stuff integrated into tools
like iproute2..
I am pretty sure i left out some important details (maybe i can write
a small doc when i am in better shape).
cheers,
jamal
next prev parent reply other threads:[~2025-03-15 20:49 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-06 23:01 [RFC PATCH 00/13] Ultra Ethernet driver introduction Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 01/13] drivers: ultraeth: add initial skeleton and kconfig option Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 02/13] drivers: ultraeth: add context support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 03/13] drivers: ultraeth: add new genl family Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 04/13] drivers: ultraeth: add job support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 05/13] drivers: ultraeth: add tunnel udp device support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 06/13] drivers: ultraeth: add initial PDS infrastructure Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 07/13] drivers: ultraeth: add request and ack receive support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 08/13] drivers: ultraeth: add request transmit support Nikolay Aleksandrov
2025-03-06 23:01 ` [RFC PATCH 09/13] drivers: ultraeth: add support for coalescing ack Nikolay Aleksandrov
2025-03-06 23:02 ` [RFC PATCH 10/13] drivers: ultraeth: add sack support Nikolay Aleksandrov
2025-03-06 23:02 ` [RFC PATCH 11/13] drivers: ultraeth: add nack support Nikolay Aleksandrov
2025-03-06 23:02 ` [RFC PATCH 12/13] drivers: ultraeth: add initiator and target idle timeout support Nikolay Aleksandrov
2025-03-06 23:02 ` [RFC PATCH 13/13] HACK: drivers: ultraeth: add char device Nikolay Aleksandrov
2025-03-08 18:46 ` [RFC PATCH 00/13] Ultra Ethernet driver introduction Leon Romanovsky
2025-03-09 3:21 ` Parav Pandit
2025-03-11 14:20 ` Bernard Metzler
2025-03-11 14:55 ` Leon Romanovsky
2025-03-11 17:11 ` Sean Hefty
2025-03-12 9:20 ` Nikolay Aleksandrov
2025-03-12 9:40 ` Nikolay Aleksandrov
2025-03-12 11:29 ` Leon Romanovsky
2025-03-12 14:20 ` Nikolay Aleksandrov
2025-03-12 15:10 ` Leon Romanovsky
2025-03-12 16:00 ` Nikolay Aleksandrov
2025-03-14 14:53 ` Bernard Metzler
2025-03-17 12:52 ` Leon Romanovsky
2025-03-19 13:52 ` Jason Gunthorpe
2025-03-19 14:02 ` Nikolay Aleksandrov
2025-03-14 20:51 ` Stanislav Fomichev
2025-03-17 12:30 ` Leon Romanovsky
2025-03-19 19:12 ` Stanislav Fomichev
2025-03-15 20:49 ` Jamal Hadi Salim [this message]
2025-03-17 12:57 ` Netlink vs ioctl WAS(Re: " Leon Romanovsky
2025-03-18 22:49 ` Jason Gunthorpe
2025-03-19 18:21 ` Jamal Hadi Salim
2025-03-19 19:19 ` Jason Gunthorpe
2025-03-25 14:12 ` Jamal Hadi Salim
2025-03-26 15:50 ` Jason Gunthorpe
2025-04-08 14:16 ` Jamal Hadi Salim
2025-04-09 16:10 ` Jason Gunthorpe
2025-03-19 16:48 ` Jason Gunthorpe
2025-03-20 11:13 ` Yunsheng Lin
2025-03-20 14:32 ` Jason Gunthorpe
2025-03-20 20:05 ` Sean Hefty
2025-03-20 20:12 ` Jason Gunthorpe
2025-03-21 2:02 ` Yunsheng Lin
2025-03-21 12:01 ` Jason Gunthorpe
2025-03-24 20:22 ` Roland Dreier
2025-03-24 21:28 ` Sean Hefty
2025-03-25 13:22 ` Bernard Metzler
2025-03-25 17:02 ` Sean Hefty
2025-03-26 14:45 ` Jason Gunthorpe
2025-03-26 15:29 ` Sean Hefty
2025-03-26 15:53 ` Jason Gunthorpe
2025-03-26 17:39 ` Sean Hefty
2025-03-27 13:26 ` Jason Gunthorpe
2025-03-28 12:20 ` Yunsheng Lin
2025-03-31 19:49 ` Sean Hefty
2025-04-01 9:19 ` Yunsheng Lin
2025-03-31 19:29 ` Sean Hefty
2025-04-01 13:04 ` Jason Gunthorpe
2025-04-01 16:57 ` Sean Hefty
2025-04-01 19:39 ` Jason Gunthorpe
2025-04-03 1:30 ` Sean Hefty
2025-04-04 16:03 ` Ziemba, Ian
2025-04-05 1:07 ` Sean Hefty
2025-04-07 19:32 ` Ziemba, Ian
2025-04-08 4:40 ` Sean Hefty
2025-04-16 23:58 ` Sean Hefty
2025-04-17 1:23 ` Jason Gunthorpe
2025-04-17 2:59 ` Sean Hefty
2025-04-17 13:31 ` Jason Gunthorpe
2025-04-18 16:50 ` Sean Hefty
2025-04-22 15:44 ` Jason Gunthorpe
2025-03-26 15:16 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAM0EoMnJW7zJ2_DBm2geTpTnc5ZenNgvcXkLn1eXk4Tu0H0R+A@mail.gmail.com \
--to=jhs@mojatatu.com \
--cc=alex.badea@keysight.com \
--cc=andrew.tauferner@cornelisnetworks.com \
--cc=bmt@zurich.ibm.com \
--cc=dan.mihailescu@keysight.com \
--cc=davem@redhat.com \
--cc=dsahern@kernel.org \
--cc=eric.davis@broadcom.com \
--cc=ian.ziemba@hpe.com \
--cc=jgg@nvidia.com \
--cc=kheib@redhat.com \
--cc=kingshuk.mandal@keysight.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nikolay@enfabrica.net \
--cc=pabeni@redhat.com \
--cc=parth.v.parikh@keysight.com \
--cc=rakhahari.bhunia@keysight.com \
--cc=rip.sohan@amd.com \
--cc=roland@enfabrica.net \
--cc=shrijeet@enfabrica.net \
--cc=welch@hpe.com \
--cc=winston.liu@keysight.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).