From: Joe Damato <jdamato@fastly.com>
To: Alejandro Colomar <alx@kernel.org>
Cc: linux-man@vger.kernel.org
Subject: Re: [PATCH 1/1] ioctl_epoll.2: New page describing ioctl(2) operations for epoll fds
Date: Thu, 6 Jun 2024 19:06:05 -0700 [thread overview]
Message-ID: <ZmJrDd-TnNxc1y4-@LQ3V64L9R2> (raw)
In-Reply-To: <xofdfsokfmqtvp47d4oqtsplm5jvx6iu4xfompgilvsno3pqlv@vwpqcnpk3g2v>
On Thu, Jun 06, 2024 at 11:39:58PM +0200, Alejandro Colomar wrote:
> Hi Joe,
>
> On Tue, Jun 04, 2024 at 06:17:40PM GMT, Joe Damato wrote:
> > Signed-off-by: Joe Damato <jdamato@fastly.com>
>
> Thanks for the patch! Please see a few comments below.
I've included some questions below just to make sure I give you a v2
that is much closer to correct :)
>
> > ---
> > man/man2/epoll_create.2 | 1 +
> > man/man2/epoll_ctl.2 | 1 +
> > man/man2/ioctl_epoll.2 | 203 ++++++++++++++++++++++++++++++++++++++++
> > man/man7/epoll.7 | 1 +
> > 4 files changed, 206 insertions(+)
> > create mode 100644 man/man2/ioctl_epoll.2
> >
> > diff --git a/man/man2/epoll_create.2 b/man/man2/epoll_create.2
> > index f0327e8ba..2aa1745f5 100644
> > --- a/man/man2/epoll_create.2
> > +++ b/man/man2/epoll_create.2
> > @@ -141,4 +141,5 @@ on overrun.
> > .BR close (2),
> > .BR epoll_ctl (2),
> > .BR epoll_wait (2),
> > +.BR ioctl_epoll (2),
> > .BR epoll (7)
> > diff --git a/man/man2/epoll_ctl.2 b/man/man2/epoll_ctl.2
> > index 6d5bc032e..24bbe7405 100644
> > --- a/man/man2/epoll_ctl.2
> > +++ b/man/man2/epoll_ctl.2
> > @@ -425,5 +425,6 @@ flag.
> > .SH SEE ALSO
> > .BR epoll_create (2),
> > .BR epoll_wait (2),
> > +.BR ioctl_epoll (2),
> > .BR poll (2),
> > .BR epoll (7)
> > diff --git a/man/man2/ioctl_epoll.2 b/man/man2/ioctl_epoll.2
> > new file mode 100644
> > index 000000000..1d53f458e
> > --- /dev/null
> > +++ b/man/man2/ioctl_epoll.2
> > @@ -0,0 +1,203 @@
> > +.\" Copyright (c) 2024, Joe Damato
> > +.\" Written by Joe Damato <jdamato@fastly.com>
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.\"
>
> Please use only one consecutive .\"
Thanks, fixed.
> > +.TH ioctl_epoll 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +ioctl_epoll \- ioctl() operations for epoll file descriptors
>
> Please add link pages <man2const/EPIOCSPARAMS.2const> and
> <man2const/EPIOCGPARAMS.2const>.
And after adding those, I'd add those to SEE ALSO and I'd omit the
description of them from the ioctl_epoll.2 page (because they'd have
their own pages) ?
> > +.SH LIBRARY
> > +Standard C library
> > +.RI ( libc ", " \-lc )
> > +.SH SYNOPSIS
> > +.nf
> > +.BR "#include <linux/eventpoll.h>" " /* Definition of " EPIOC* " constants and struct epoll_params */"
> > +.B "#include <sys/ioctl.h>"
> > +.P
> > +.BI "int ioctl(int " fd ", int " op ", void " *argp ");"
>
> The '*' should be bold; not italics.
Thanks, fixed.
> > +.fi
> > +.SH DESCRIPTION
> > +Various
> > +.BR ioctl (2)
> > +operations can be performed on an epoll file descriptor (created by a call
> > +to
> > +.BR epoll_create (2))
> > +(since Linux 6.9 and glibc 2.40) using calls of the form:
> > +.\" commit 18e2bf0edf4dd88d9656ec92395aa47392e85b61
> > +.\" glibc commit 92c270d32caf3f8d5a02b8e46c7ec5d9d0315158
> > +.P
> > +.in +4n
> > +.EX
> > +ioctl(fd, op, argp);
> > +.EE
> > +.in
> > +.P
> > +In the above,
> > +.I fd
> > +is a file descriptor referring to an epoll file descriptor obtained with a
> > +call to
> > +.BR epoll_create (2).
> > +.I op
> > +is one of the operations listed below, and
> > +.I argp
> > +is a pointer to the data structure described below.
>
> If argp is a pointer to a structure, then the prototype should document
> that:
>
> .BI "int ioctl(int " fd ", int " op ", struct epoll_params *" argp );
>
> Also, I would document the two operations separately:
>
> .BI "int ioctl(int " fd ", EPIOCSPARAMS, const struct epoll_params *" argp );
> .BI "int ioctl(int " fd ", EPIOCGPARAMS, struct epoll_params *" argp );
>
> This allows documenting that the 'S' one doesn't modify the argp
> (AFAICS).
Do you mean that I should omit the generic
.BI "int ioctl(int " fd ", int " op ", struct epoll_params *" argp );
and instead have the above with the two ops?
> > +.\"
> > +.P
> > +All supported
> > +.I op
> > +values (described below) use an
> > +.I argp
> > +argument which is a pointer to a
> > +.I epoll_params
> > +structure, defined as:
> > +.P
> > +.in +4n
> > +.EX
> > +struct epoll_params {
> > + uint32_t busy_poll_usecs; /* Number of usecs to busy poll */
> > + uint16_t busy_poll_budget; /* Maximum number of packets to retrieve per poll */
> > + uint8_t prefer_busy_poll; /* Boolean to enable or disable prefer busy poll */
> > +
> > + /* pad the struct to a multiple of 64bits */
> > + uint8_t __pad; /* Must be zero */
> > +};
>
> You could add this type definition to the SYNOPSIS, below the function
> prototypes.
OK, I moved it.
I'm not sure if it is formatted properly, though. I'll see if I can
find other examples of this style to check against.
> > +.EE
> > +.in
> > +.P
> > +The
> > +.I busy_poll_usecs
> > +field denotes the number of microseconds that the network stack will busy
> > +poll. During this time period, the network device will be polled
> > +repeatedly. This value cannot exceed
> > +.B INT_MAX .
> > +.in
> > +.P
> > +The
> > +.I busy_poll_budget
> > +field denotes the maximum number of packets that the network stack will
> > +be retrieved on each poll attempt. This value cannot exceed
> > +.B NAPI_POLL_WEIGHT
> > +which, as of Linux 6.9, is 64, unless the process is run with
> > +.B CAP_NET_ADMIN .
> > +.P
> > +The
> > +.I prefer_busy_poll
> > +field is a boolean field and must be either 0 (disabled) or 1 (enabled). If
> > +enabled, this indicates to the network stack that busy poll is the
> > +preferred method of processing network data and the network stack should
> > +give the application the opportunity to busy poll. Without this option,
> > +very busy systems may continue to do network processing via the normal
> > +method of IRQs triggering softIRQ and NAPI.
> > +.P
> > +The supported
> > +.I op
> > +values:
> > +.TP
> > +.B EPIOCSPARAMS
> > +This operation allows the caller to specify an
> > +.I epoll_params
> > +structure to configure the operation of epoll. Refer to the structure
> > +description of the structure above to learn what configuration is
> > +supported.
> > +.TP
> > +.B EPIOCGPARAMS
> > +This operation allows the caller to retrieve the current
> > +.I epoll_params
> > +structure. This can be used to determine what the current settings are.
> > +.SH RETURN VALUE
> > +On success, 0 is returned.
> > +On failure, \-1 is returned, and
> > +.I errno
> > +is set to indicate the error.
> > +.SH ERRORS
> > +.TP
> > +.B EOPNOTSUPP
> > +The kernel was not compiled with busy poll support.
> > +.TP
> > +.B ENOIOCTLCMD
>
> Is this a thing?
>
> $ grep -rn ENOIOCTLCMD /usr/include/
> $
>
> I suspect this is EINVAL in user space? (Actually, you list the same
> exact error condition for EINVAL below.)
Good catch!
Yes, I looked at the kernel code, it looks like this
becomes ENOTTY before entering userland, but due to the way the code
was written (I'll double check again) I think I wrote this to return
EINVAL.
> > +The specified
> > +.I op
> > +is invalid.
> > +.TP
> > +.B EINVAL
> > +The
> > +.I fd
> > +specified is not an epoll file descriptor, or the
> > +.I op
> > +specified is invalid, or the
> > +.I __pad
> > +was not initialized to zero, or
> > +.I busy_poll_usecs
> > +exceeds
> > +.B INT_MAX ,
> > +or
> > +.I prefer_busy_poll
> > +is not 0 or 1.
>
> Please separate the different conditions for EINVAL into separate
> entries. The ones that are related can go in the same one, but the
> unrelated ones are better split.
Thanks!
I did something like this:
.TP
.B EINVAL
The
.I fd
specified is not an epoll file descriptor.
.TP
.B EINVAL
The
.I op
specified is invalid.
.TP
.B EINVAL
The
.I argp.__pad
was not initialized to zero.
is that what you meant?
> > +.TP
> > +.B EPERM
> > +The process is being run without
> > +.I CAP_NET_ADMIN
>
> This should be bold.
Thanks, fixed.
> > +and the specified
> > +.I busy_poll_budget
>
> This should be
>
> .I argp.busy_poll_budget
Thanks, fixed.
> > +exceeds
> > +.I NAPI_POLL_WEIGHT
>
> This should be bold.
Thanks, fixed.
> > +(which is 64 as of Linux 6.9).
> > +.TP
> > +.B EFAULT
> > +.I argp
> > +does not point to a valid memory address.
> > +.SH EXAMPLES
> > +.EX
>
> Could you write an entire program, with a main()?
>
> If not, it's fine; this is better than nothing. But we prefer having
> entire programs that users can paste and try.
Hmm.. I can give it a try! It's a bit tricky because to actually use
busy polling, you need to have a program that accepts incoming
connections and calls epoll_wait (after setting the appropriate
values with the ioctl).
I could write something simple that does that but it would be a bit
long, I think.
> > +/* Code to set the epoll params to enable busy polling */
> > +\&
> > +int epollfd = epoll_create1(0);
> > +struct epoll_params params;
> > +\&
> > +if (epollfd == \-1) {
> > + perror("epoll_create1");
> > + exit(EXIT_FAILURE);
> > +}
> > +\&
> > +memset(¶ms, 0, sizeof(struct epoll_params));
> > +\&
> > +params.busy_poll_usecs = 25;
> > +params.busy_poll_budget = 8;
> > +params.prefer_busy_poll = 1;
> > +\&
> > +if (ioctl(epollfd, EPIOCSPARAMS, ¶ms) == \-1) {
> > + perror("ioctl");
> > + exit(EXIT_FAILURE);
> > +}
> > +\&
> > +/* Code to show how to retrieve the current settings */
> > +\&
> > +memset(¶ms, 0, sizeof(struct epoll_params));
> > +\&
> > +if (ioctl(epollfd, EPIOCGPARAMS, ¶ms) == \-1) {
> > + perror("ioctl");
> > + exit(EXIT_FAILURE);
> > +}
> > +\&
> > +/* params struct now contains the current parameters */
> > +\&
> > +fprintf(stderr, "epoll usecs: %lu\\n", params.busy_poll_usecs);
> > +fprintf(stderr, "epoll packet budget: %u\\n", params.busy_poll_budget);
> > +fprintf(stderr, "epoll prefer busy poll: %u\\n", params.prefer_busy_poll);
> > +\&
> > +.SH History
> > +Linux 6.9.
> > +glibc 2.40.
> > +.SH SEE ALSO
> > +.BR ioctl (2),
> > +.BR epoll_create (2),
> > +.BR epoll_create1 (2),
> > +.BR epoll (7)
> > +.P
> > +.I Documentation/networking/napi.rst
> > +.P
> > +and
> > +.P
> > +.I Documentation/admin-guide/sysctl/net.rst
> > +.P
> > +in the Linux kernel source tree
>
> We could document that as
>
> .I linux.git/Documentation/...
>
> to not have to say "in the Linux kernel source tree".
OK, I updated the above to be what you suggested.
> > diff --git a/man/man7/epoll.7 b/man/man7/epoll.7
> > index e7892922e..4ad032bdd 100644
> > --- a/man/man7/epoll.7
> > +++ b/man/man7/epoll.7
> > @@ -606,5 +606,6 @@ is present in an epoll instance.
> > .BR epoll_create1 (2),
> > .BR epoll_ctl (2),
> > .BR epoll_wait (2),
> > +.BR ioctl_epoll (2),
> > .BR poll (2),
> > .BR select (2)
> > --
> > 2.34.1
> >
>
> --
> <https://www.alejandro-colomar.es/>
next prev parent reply other threads:[~2024-06-07 2:06 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-04 18:17 [PATCH 0/1] ioctl_epoll.2: Add epoll ioctl documentation Joe Damato
2024-06-04 18:17 ` [PATCH 1/1] ioctl_epoll.2: New page describing ioctl(2) operations for epoll fds Joe Damato
2024-06-06 21:39 ` Alejandro Colomar
2024-06-06 21:46 ` Alejandro Colomar
2024-06-07 21:53 ` Joe Damato
2024-06-09 17:04 ` Alejandro Colomar
2024-06-10 17:29 ` Joe Damato
2024-06-10 18:24 ` Alejandro Colomar
2024-06-10 20:16 ` Joe Damato
2024-06-10 22:21 ` Alejandro Colomar
2024-06-11 12:39 ` G. Branden Robinson
2024-06-11 14:34 ` Alejandro Colomar
2024-06-11 16:06 ` G. Branden Robinson
2024-06-11 16:42 ` Alejandro Colomar
2024-06-07 2:06 ` Joe Damato [this message]
2024-06-07 10:29 ` Alejandro Colomar
2024-06-07 21:56 ` Joe Damato
2024-06-06 21:44 ` [PATCH 0/1] ioctl_epoll.2: Add epoll ioctl documentation Alejandro Colomar
2024-06-06 22:14 ` Joe Damato
2024-06-06 22:25 ` Alejandro Colomar
2024-06-07 2:08 ` Joe Damato
2024-06-07 10:44 ` Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZmJrDd-TnNxc1y4-@LQ3V64L9R2 \
--to=jdamato@fastly.com \
--cc=alx@kernel.org \
--cc=linux-man@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox