linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Karsten <mkarsten@uwaterloo.ca>
To: Stanislav Fomichev <sdf@fomichev.me>
Cc: netdev@vger.kernel.org, Joe Damato <jdamato@fastly.com>,
	amritha.nambiar@intel.com, sridhar.samudrala@intel.com,
	Alexander Lobakin <aleksander.lobakin@intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Breno Leitao <leitao@debian.org>,
	Christian Brauner <brauner@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Jan Kara <jack@suse.cz>,
	Jiri Pirko <jiri@resnulli.us>,
	Johannes Berg <johannes.berg@intel.com>,
	Jonathan Corbet <corbet@lwn.net>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	"open list:FILESYSTEMS (VFS and infrastructure)"
	<linux-fsdevel@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [RFC net-next 0/5] Suspend IRQs during preferred busy poll
Date: Mon, 12 Aug 2024 17:46:42 -0400	[thread overview]
Message-ID: <2bb121dd-3dcd-4142-ab87-02ccf4afd469@uwaterloo.ca> (raw)
In-Reply-To: <ZrpuWMoXHxzPvvhL@mini-arch>

On 2024-08-12 16:19, Stanislav Fomichev wrote:
> On 08/12, Joe Damato wrote:
>> Greetings:
>>
>> Martin Karsten (CC'd) and I have been collaborating on some ideas about
>> ways of reducing tail latency when using epoll-based busy poll and we'd
>> love to get feedback from the list on the code in this series. This is
>> the idea I mentioned at netdev conf, for those who were there. Barring
>> any major issues, we hope to submit this officially shortly after RFC.
>>
>> The basic idea for suspending IRQs in this manner was described in an
>> earlier paper presented at Sigmetrics 2024 [1].
> 
> Let me explicitly call out the paper. Very nice analysis!

Thank you!

[snip]

>> Here's how it is intended to work:
>>    - An administrator sets the existing sysfs parameters for
>>      defer_hard_irqs and gro_flush_timeout to enable IRQ deferral.
>>
>>    - An administrator sets the new sysfs parameter irq_suspend_timeout
>>      to a larger value than gro-timeout to enable IRQ suspension.
> 
> Can you expand more on what's the problem with the existing gro_flush_timeout?
> Is it defer_hard_irqs_count? Or you want a separate timeout only for the
> perfer_busy_poll case(why?)? Because looking at the first two patches,
> you essentially replace all usages of gro_flush_timeout with a new variable
> and I don't see how it helps.

gro-flush-timeout (in combination with defer-hard-irqs) is the default 
irq deferral mechanism and as such, always active when configured. Its 
static periodic softirq processing leads to a situation where:

- A long gro-flush-timeout causes high latencies when load is 
sufficiently below capacity, or

- a short gro-flush-timeout causes overhead when softirq execution 
asynchronously competes with application processing at high load.

The shortcomings of this are documented (to some extent) by our 
experiments. See defer20 working well at low load, but having problems 
at high load, while defer200 having higher latency at low load.

irq-suspend-timeout is only active when an application uses 
prefer-busy-polling and in that case, produces a nice alternating 
pattern of application processing and networking processing (similar to 
what we describe in the paper). This then works well with both low and 
high load.

> Maybe expand more on what code paths are we trying to improve? Existing
> busy polling code is not super readable, so would be nice to simplify
> it a bit in the process (if possible) instead of adding one more tunable.

There are essentially three possible loops for network processing:

1) hardirq -> softirq -> napi poll; this is the baseline functionality

2) timer -> softirq -> napi poll; this is deferred irq processing scheme 
with the shortcomings described above

3) epoll -> busy-poll -> napi poll

If a system is configured for 1), not much can be done, as it is 
difficult to interject anything into this loop without adding state and 
side effects. This is what we tried for the paper, but it ended up being 
a hack.

If however the system is configured for irq deferral, Loops 2) and 3) 
"wrestle" with each other for control. Injecting the larger 
irq-suspend-timeout for 'timer' in Loop 2) essentially tilts this in 
favour of Loop 3) and creates the nice pattern describe above.

[snip]

>>    - suspendX:
>>      - set defer_hard_irqs to 100
>>      - set gro_flush_timeout to X,000
>>      - set irq_suspend_timeout to 20,000,000
>>      - enable busy poll via the existing ioctl (busy_poll_usecs = 0,
>>        busy_poll_budget = 64, prefer_busy_poll = true)
> 
> What's the intention of `busy_poll_usecs = 0` here? Presumably we fallback
> to busy_poll sysctl value?

Before this patch set, ep_poll only calls napi_busy_poll, if busy_poll 
(sysctl) or busy_poll_usecs is nonzero. However, this might lead to 
busy-polling even when the application does not actually need or want 
it. Only one iteration through the busy loop is needed to make the new 
scheme work. Additional napi busy polling over and above is optional.

Thanks,
Martin


  reply	other threads:[~2024-08-12 21:46 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-12 12:57 [RFC net-next 0/5] Suspend IRQs during preferred busy poll Joe Damato
2024-08-12 12:57 ` [RFC net-next 4/5] eventpoll: Trigger napi_busy_loop, if prefer_busy_poll is set Joe Damato
2024-08-12 13:19   ` Christoph Hellwig
2024-08-12 16:17     ` Matthew Wilcox
2024-08-12 17:49       ` Joe Damato
2024-08-12 17:46     ` Joe Damato
2024-08-12 12:57 ` [RFC net-next 5/5] eventpoll: Control irq suspension for prefer_busy_poll Joe Damato
2024-08-12 20:20   ` Stanislav Fomichev
2024-08-12 21:47     ` Martin Karsten
2024-08-12 20:19 ` [RFC net-next 0/5] Suspend IRQs during preferred busy poll Stanislav Fomichev
2024-08-12 21:46   ` Martin Karsten [this message]
2024-08-12 23:03     ` Stanislav Fomichev
2024-08-13  0:04       ` Martin Karsten
2024-08-13  1:54         ` Stanislav Fomichev
2024-08-13  2:35           ` Martin Karsten
2024-08-13  4:07             ` Stanislav Fomichev
2024-08-13 13:18               ` Martin Karsten
2024-08-14  3:16                 ` Willem de Bruijn
2024-08-14 14:19                   ` Joe Damato
2024-08-14 15:08                     ` Willem de Bruijn
2024-08-14 15:46                       ` Joe Damato
2024-08-14 19:53                 ` Samiullah Khawaja
2024-08-14 20:42                   ` Martin Karsten
2024-08-16 14:27                     ` Willem de Bruijn
2024-08-16 14:59                       ` Willem de Bruijn
2024-08-16 15:25                         ` Joe Damato
2024-08-16 17:01                           ` Willem de Bruijn
2024-08-16 20:03                             ` Martin Karsten
2024-08-16 20:58                               ` Willem de Bruijn
2024-08-17 18:15                                 ` Martin Karsten
2024-08-18 12:55                                   ` Willem de Bruijn
2024-08-18 14:51                                     ` Martin Karsten
2024-08-20  2:36                                       ` Jakub Kicinski
2024-08-20 14:28                                         ` Martin Karsten
2024-08-17 10:00                             ` Joe Damato
2024-08-14  0:10     ` Jakub Kicinski
2024-08-14  1:14       ` Martin Karsten
2024-08-20  2:07         ` Jakub Kicinski
2024-08-20 14:27           ` Martin Karsten

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2bb121dd-3dcd-4142-ab87-02ccf4afd469@uwaterloo.ca \
    --to=mkarsten@uwaterloo.ca \
    --cc=aleksander.lobakin@intel.com \
    --cc=amritha.nambiar@intel.com \
    --cc=bigeasy@linutronix.de \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jack@suse.cz \
    --cc=jdamato@fastly.com \
    --cc=jiri@resnulli.us \
    --cc=johannes.berg@intel.com \
    --cc=kuba@kernel.org \
    --cc=leitao@debian.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lorenzo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=sridhar.samudrala@intel.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).