linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joe Damato <jdamato@fastly.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Martin Karsten <mkarsten@uwaterloo.ca>,
	Samiullah Khawaja <skhawaja@google.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	netdev@vger.kernel.org, amritha.nambiar@intel.com,
	sridhar.samudrala@intel.com,
	Alexander Lobakin <aleksander.lobakin@intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Breno Leitao <leitao@debian.org>,
	Christian Brauner <brauner@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Jan Kara <jack@suse.cz>,
	Jiri Pirko <jiri@resnulli.us>,
	Johannes Berg <johannes.berg@intel.com>,
	Jonathan Corbet <corbet@lwn.net>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	"open list:FILESYSTEMS (VFS and infrastructure)"
	<linux-fsdevel@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [RFC net-next 0/5] Suspend IRQs during preferred busy poll
Date: Sat, 17 Aug 2024 11:00:56 +0100	[thread overview]
Message-ID: <ZsB02OnFM1IhLkAt@LQ3V64L9R2> (raw)
In-Reply-To: <66bf85f635b2e_184d66294b9@willemb.c.googlers.com.notmuch>

On Fri, Aug 16, 2024 at 01:01:42PM -0400, Willem de Bruijn wrote:
> Joe Damato wrote:
> > On Fri, Aug 16, 2024 at 10:59:51AM -0400, Willem de Bruijn wrote:
[...]
> > > Willem de Bruijn wrote:
> > > If the only goal is to safely reenable interrupts when the application
> > > stops calling epoll_wait, does this have to be user tunable?
> > > 
> > > Can it be either a single good enough constant, or derived from
> > > another tunable, like busypoll_read.
> > 
> > I believe you meant busy_read here, is that right?
> > 
> > At any rate:
> > 
> >   - I don't think a single constant is appropriate, just as it
> >     wasn't appropriate for the existing mechanism
> >     (napi_defer_hard_irqs/gro_flush_timeout), and
> > 
> >   - Deriving the value from a pre-existing parameter to preserve the
> >     ABI, like busy_read, makes using this more confusing for users
> >     and complicates the API significantly.
> > 
> > I agree we should get the API right from the start; that's why we've
> > submit this as an RFC ;)
> > 
> > We are happy to take suggestions from the community, but, IMHO,
> > re-using an existing parameter for a different purpose only in
> > certain circumstances (if I understand your suggestions) is a much
> > worse choice than adding a new tunable that clearly states its
> > intended singular purpose.
> 
> Ack. I was thinking whether an epoll flag through your new epoll
> ioctl interface to toggle the IRQ suspension (and timer start)
> would be preferable. Because more fine grained.

I understand why you are asking about this and I think it would be
great if this were possible, but unfortunately it isn't.

epoll contexts can be associated with any NAPI ID, but the IRQ
suspension is NAPI specific.

As an example: imagine a user program creates an epoll context and
adds fds with NAPI ID 1111 to the context. It then issues the ioctl
to set the suspend timeout for that context. Then, for whatever
reason, the user app decides to remove all the fds and add new ones,
this time from NAPI ID 2222, which happens to be a different
net_device.

What does that mean for the suspend timeout? It's not clear to me
what the right behavior would be in this situation (does it persist?
does it get cleared when a new NAPI ID is added? etc) and it makes
the user API much more complicated, with many more edge cases and
possible bugs.

> Also, the value is likely dependent more on the expected duration
> of userspace processing? If so, it would be the same for all
> devices, so does a per-netdev value make sense?

There is presently no way to set values like gro_timeout,
defer_hard_irqs, or this new proposed value on a per-NAPI basis.
IMHO, that is really where all of these values should live.

I mentioned on the list previously (and also in the cover letter),
that time permitting, I think the correct evolution of this would be
to support per-NAPI settings (via netdev-genl, I assume) so that
user programs can set all 3 values on only the NAPIs they care
about.

Until that functionality is available, it would seem per-netdev is
the only way for this feature to be added at the present time. I
simply haven't had the time to add the above interface. This
feature we're proposing has demonstrable performance value, but it
doesn't seem sensible to block it until time permits me to add a
per-NAPI interface for all of these values given that we already
globally expose 2 of them.

That said, I appreciate the thoughtfulness of your replies and I am
open to other suggestions.

- Joe

  parent reply	other threads:[~2024-08-17 10:01 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-12 12:57 [RFC net-next 0/5] Suspend IRQs during preferred busy poll Joe Damato
2024-08-12 12:57 ` [RFC net-next 4/5] eventpoll: Trigger napi_busy_loop, if prefer_busy_poll is set Joe Damato
2024-08-12 13:19   ` Christoph Hellwig
2024-08-12 16:17     ` Matthew Wilcox
2024-08-12 17:49       ` Joe Damato
2024-08-12 17:46     ` Joe Damato
2024-08-12 12:57 ` [RFC net-next 5/5] eventpoll: Control irq suspension for prefer_busy_poll Joe Damato
2024-08-12 20:20   ` Stanislav Fomichev
2024-08-12 21:47     ` Martin Karsten
2024-08-12 20:19 ` [RFC net-next 0/5] Suspend IRQs during preferred busy poll Stanislav Fomichev
2024-08-12 21:46   ` Martin Karsten
2024-08-12 23:03     ` Stanislav Fomichev
2024-08-13  0:04       ` Martin Karsten
2024-08-13  1:54         ` Stanislav Fomichev
2024-08-13  2:35           ` Martin Karsten
2024-08-13  4:07             ` Stanislav Fomichev
2024-08-13 13:18               ` Martin Karsten
2024-08-14  3:16                 ` Willem de Bruijn
2024-08-14 14:19                   ` Joe Damato
2024-08-14 15:08                     ` Willem de Bruijn
2024-08-14 15:46                       ` Joe Damato
2024-08-14 19:53                 ` Samiullah Khawaja
2024-08-14 20:42                   ` Martin Karsten
2024-08-16 14:27                     ` Willem de Bruijn
2024-08-16 14:59                       ` Willem de Bruijn
2024-08-16 15:25                         ` Joe Damato
2024-08-16 17:01                           ` Willem de Bruijn
2024-08-16 20:03                             ` Martin Karsten
2024-08-16 20:58                               ` Willem de Bruijn
2024-08-17 18:15                                 ` Martin Karsten
2024-08-18 12:55                                   ` Willem de Bruijn
2024-08-18 14:51                                     ` Martin Karsten
2024-08-20  2:36                                       ` Jakub Kicinski
2024-08-20 14:28                                         ` Martin Karsten
2024-08-17 10:00                             ` Joe Damato [this message]
2024-08-14  0:10     ` Jakub Kicinski
2024-08-14  1:14       ` Martin Karsten
2024-08-20  2:07         ` Jakub Kicinski
2024-08-20 14:27           ` Martin Karsten

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZsB02OnFM1IhLkAt@LQ3V64L9R2 \
    --to=jdamato@fastly.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=amritha.nambiar@intel.com \
    --cc=bigeasy@linutronix.de \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jack@suse.cz \
    --cc=jiri@resnulli.us \
    --cc=johannes.berg@intel.com \
    --cc=kuba@kernel.org \
    --cc=leitao@debian.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lorenzo@kernel.org \
    --cc=mkarsten@uwaterloo.ca \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=skhawaja@google.com \
    --cc=sridhar.samudrala@intel.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).