public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Jason Baron <jbaron@akamai.com>
Cc: dave@stgolabs.net, rpenyaev@suse.de,
	linux-kernel@vger.kernel.org, normalperson@yhbt.net,
	viro@zeniv.linux.org.uk
Subject: Re: [PATCH] fs/epoll: make nesting accounting safe for -rt kernel
Date: Mon, 24 Feb 2020 16:38:35 -0800	[thread overview]
Message-ID: <20200224163835.08ab964483519052d7c2e39b@linux-foundation.org> (raw)
In-Reply-To: <1579288607-11868-1-git-send-email-jbaron@akamai.com>

On Fri, 17 Jan 2020 14:16:47 -0500 Jason Baron <jbaron@akamai.com> wrote:

> Davidlohr Bueso pointed out that when CONFIG_DEBUG_LOCK_ALLOC is set
> ep_poll_safewake() can take several non-raw spinlocks after disabling
> pre-emption which is no no for the -rt kernel. So let's re-work how we
> determine the nesting level such that it plays nicely with -rt kernel.

"no no" isn't terribly informative, and knowledge of -rt's requirements
isn't widespread.  Can we please spell this requirement out in full
detail, if only to spread the -rt education a bit?

> Let's introduce a 'nests' field in struct eventpoll that records the
> current nesting level during ep_poll_callback(). Then, if we nest again we
> can find the previous struct eventpoll that we were called from and
> increase our count by 1. The 'nests' field is protected by
> ep->poll_wait.lock.
> 
> I've also moved napi_id field into a hole in struct eventpoll as part of
> introduing the nests field. This change reduces the struct eventpoll from
> 184 bytes to 176 bytes on x86_64 for the !CONFIG_DEBUG_LOCK_ALLOC
> production config.
> 
> ...
>
> @@ -551,30 +557,43 @@ static int ep_call_nested(struct nested_calls *ncalls,
>   */
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  
> -static DEFINE_PER_CPU(int, wakeup_nest);
> -
> -static void ep_poll_safewake(wait_queue_head_t *wq)
> +static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
>  {
> +	struct eventpoll *ep_src;
>  	unsigned long flags;
> -	int subclass;
> +	u8 nests = 0;
>  
> -	local_irq_save(flags);
> -	preempt_disable();
> -	subclass = __this_cpu_read(wakeup_nest);
> -	spin_lock_nested(&wq->lock, subclass + 1);
> -	__this_cpu_inc(wakeup_nest);
> -	wake_up_locked_poll(wq, POLLIN);
> -	__this_cpu_dec(wakeup_nest);
> -	spin_unlock(&wq->lock);
> -	local_irq_restore(flags);
> -	preempt_enable();
> +	/*
> +	 * If we are not being call from ep_poll_callback(), epi is
> +	 * NULL and we are at the first level of nesting, 0. Otherwise,
> +	 * we are being called from ep_poll_callback() and if a previous
> +	 * wakeup source is not an epoll file itself, we are at depth
> +	 * 1 since the wakeup source is depth 0. If the wakeup source
> +	 * is a previous epoll file in the wakeup chain then we use its
> +	 * nests value and record ours as nests + 1. The previous epoll
> +	 * file nests value is stable since its already holding its
> +	 * own poll_wait.lock.
> +	 */

Similarly, it would be helpful if this comment were to explain that
this code exists for -rt's requirements, and to briefly describe what
that requirement is.

> +	if (epi) {
> +		if ((is_file_epoll(epi->ffd.file))) {
> +			ep_src = epi->ffd.file->private_data;
> +			nests = ep_src->nests;
> +		} else {
> +			nests = 1;
> +		}
> +	}
> +	spin_lock_irqsave_nested(&ep->poll_wait.lock, flags, nests);
> +	ep->nests = nests + 1;
> +	wake_up_locked_poll(&ep->poll_wait, EPOLLIN);
> +	ep->nests = 0;
> +	spin_unlock_irqrestore(&ep->poll_wait.lock, flags);
>  }


  reply	other threads:[~2020-02-25  0:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04 20:22 [PATCH] epoll: simplify ep_poll_safewake() for CONFIG_DEBUG_LOCK_ALLOC Jason Baron
2019-09-23 15:43 ` Jason Baron
2019-09-23 19:23   ` Roman Penyaev
2019-09-24 17:34     ` Jason Baron
2019-09-24 17:52       ` Roman Penyaev
2020-01-06 19:38         ` [PATCH] fs/epoll: rework safewake " Davidlohr Bueso
2020-01-06 20:28           ` Jason Baron
2020-01-06 21:01             ` Davidlohr Bueso
2020-01-17 19:16               ` [PATCH] fs/epoll: make nesting accounting safe for -rt kernel Jason Baron
2020-02-25  0:38                 ` Andrew Morton [this message]
2020-02-26 17:56                   ` [PATCH v2] " Jason Baron
2020-03-17 16:34                     ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200224163835.08ab964483519052d7c2e39b@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=jbaron@akamai.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=normalperson@yhbt.net \
    --cc=rpenyaev@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox