Re: [RESEND RFC PATCH] epoll: autoremove wakers even more aggressively

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: Benjamin Segall <bsegall@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Shakeel Butt <shakeelb@google.com>,
	Eric Dumazet <edumazet@google.com>,
	Roman Penyaev <rpenyaev@suse.de>, Jason Baron <jbaron@akamai.com>,
	Khazhismel Kumykov <khazhy@google.com>, Heiher <r@hev.cc>
Subject: Re: [RESEND RFC PATCH] epoll: autoremove wakers even more aggressively
Date: Wed, 29 Jun 2022 16:55:42 -0700	[thread overview]
Message-ID: <20220629165542.da7fc8a2a5dbd53cf99572aa@linux-foundation.org> (raw)
In-Reply-To: <xm26fsjotqda.fsf@google.com>

On Wed, 15 Jun 2022 14:24:23 -0700 Benjamin Segall <bsegall@google.com> wrote:

> If a process is killed or otherwise exits while having active network
> connections and many threads waiting on epoll_wait, the threads will all
> be woken immediately, but not removed from ep->wq. Then when network
> traffic scans ep->wq in wake_up, every wakeup attempt will fail, and
> will not remove the entries from the list.
> 
> This means that the cost of the wakeup attempt is far higher than usual,
> does not decrease, and this also competes with the dying threads trying
> to actually make progress and remove themselves from the wq.
> 
> Handle this by removing visited epoll wq entries unconditionally, rather
> than only when the wakeup succeeds - the structure of ep_poll means that
> the only potential loss is the timed_out->eavail heuristic, which now
> can race and result in a redundant ep_send_events attempt. (But only
> when incoming data and a timeout actually race, not on every timeout)
>

Thanks.  I added people from 412895f03cbf96 ("epoll: atomically remove
wait entry on wake up") to cc.  Hopefully someone there can help review
and maybe test this.


> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index e2daa940ebce..8b56b94e2f56 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1745,10 +1745,25 @@ static struct timespec64 *ep_timeout_to_timespec(struct timespec64 *to, long ms)
>  	ktime_get_ts64(&now);
>  	*to = timespec64_add_safe(now, *to);
>  	return to;
>  }
>  
> +/*
> + * autoremove_wake_function, but remove even on failure to wake up, because we
> + * know that default_wake_function/ttwu will only fail if the thread is already
> + * woken, and in that case the ep_poll loop will remove the entry anyways, not
> + * try to reuse it.
> + */
> +static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry,
> +				       unsigned int mode, int sync, void *key)
> +{
> +	int ret = default_wake_function(wq_entry, mode, sync, key);
> +
> +	list_del_init(&wq_entry->entry);
> +	return ret;
> +}
> +
>  /**
>   * ep_poll - Retrieves ready events, and delivers them to the caller-supplied
>   *           event buffer.
>   *
>   * @ep: Pointer to the eventpoll context.
> @@ -1826,12 +1841,19 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
>  		 * chance to harvest new event. Otherwise wakeup can be
>  		 * lost. This is also good performance-wise, because on
>  		 * normal wakeup path no need to call __remove_wait_queue()
>  		 * explicitly, thus ep->lock is not taken, which halts the
>  		 * event delivery.
> +		 *
> +		 * In fact, we now use an even more aggressive function that
> +		 * unconditionally removes, because we don't reuse the wait
> +		 * entry between loop iterations. This lets us also avoid the
> +		 * performance issue if a process is killed, causing all of its
> +		 * threads to wake up without being removed normally.
>  		 */
>  		init_wait(&wait);
> +		wait.func = ep_autoremove_wake_function;
>  
>  		write_lock_irq(&ep->lock);
>  		/*
>  		 * Barrierless variant, waitqueue_active() is called under
>  		 * the same lock on wakeup ep_poll_callback() side, so it
> -- 
> 2.36.1.476.g0c4daa206d-goog

next prev parent reply	other threads:[~2022-06-29 23:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-15 21:24 [RESEND RFC PATCH] epoll: autoremove wakers even more aggressively Benjamin Segall
2022-06-29 23:55 ` Andrew Morton [this message]
2022-06-30  1:12   ` Shakeel Butt
2022-06-30  2:24     ` Andrew Morton
2022-06-30 14:59       ` Shakeel Butt
2022-07-16  1:27         ` Shakeel Butt
2022-07-16  4:55           ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220629165542.da7fc8a2a5dbd53cf99572aa@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=bsegall@google.com \
    --cc=edumazet@google.com \
    --cc=jbaron@akamai.com \
    --cc=khazhy@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=r@hev.cc \
    --cc=rpenyaev@suse.de \
    --cc=shakeelb@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox