From: Andrew Morton <akpm@linux-foundation.org>
To: Jason Baron <jbaron@akamai.com>
Cc: dave@stgolabs.net, rpenyaev@suse.de,
linux-kernel@vger.kernel.org, normalperson@yhbt.net,
viro@zeniv.linux.org.uk
Subject: Re: [PATCH] fs/epoll: make nesting accounting safe for -rt kernel
Date: Mon, 24 Feb 2020 16:38:35 -0800 [thread overview]
Message-ID: <20200224163835.08ab964483519052d7c2e39b@linux-foundation.org> (raw)
In-Reply-To: <1579288607-11868-1-git-send-email-jbaron@akamai.com>
On Fri, 17 Jan 2020 14:16:47 -0500 Jason Baron <jbaron@akamai.com> wrote:
> Davidlohr Bueso pointed out that when CONFIG_DEBUG_LOCK_ALLOC is set
> ep_poll_safewake() can take several non-raw spinlocks after disabling
> pre-emption which is no no for the -rt kernel. So let's re-work how we
> determine the nesting level such that it plays nicely with -rt kernel.
"no no" isn't terribly informative, and knowledge of -rt's requirements
isn't widespread. Can we please spell this requirement out in full
detail, if only to spread the -rt education a bit?
> Let's introduce a 'nests' field in struct eventpoll that records the
> current nesting level during ep_poll_callback(). Then, if we nest again we
> can find the previous struct eventpoll that we were called from and
> increase our count by 1. The 'nests' field is protected by
> ep->poll_wait.lock.
>
> I've also moved napi_id field into a hole in struct eventpoll as part of
> introduing the nests field. This change reduces the struct eventpoll from
> 184 bytes to 176 bytes on x86_64 for the !CONFIG_DEBUG_LOCK_ALLOC
> production config.
>
> ...
>
> @@ -551,30 +557,43 @@ static int ep_call_nested(struct nested_calls *ncalls,
> */
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>
> -static DEFINE_PER_CPU(int, wakeup_nest);
> -
> -static void ep_poll_safewake(wait_queue_head_t *wq)
> +static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
> {
> + struct eventpoll *ep_src;
> unsigned long flags;
> - int subclass;
> + u8 nests = 0;
>
> - local_irq_save(flags);
> - preempt_disable();
> - subclass = __this_cpu_read(wakeup_nest);
> - spin_lock_nested(&wq->lock, subclass + 1);
> - __this_cpu_inc(wakeup_nest);
> - wake_up_locked_poll(wq, POLLIN);
> - __this_cpu_dec(wakeup_nest);
> - spin_unlock(&wq->lock);
> - local_irq_restore(flags);
> - preempt_enable();
> + /*
> + * If we are not being call from ep_poll_callback(), epi is
> + * NULL and we are at the first level of nesting, 0. Otherwise,
> + * we are being called from ep_poll_callback() and if a previous
> + * wakeup source is not an epoll file itself, we are at depth
> + * 1 since the wakeup source is depth 0. If the wakeup source
> + * is a previous epoll file in the wakeup chain then we use its
> + * nests value and record ours as nests + 1. The previous epoll
> + * file nests value is stable since its already holding its
> + * own poll_wait.lock.
> + */
Similarly, it would be helpful if this comment were to explain that
this code exists for -rt's requirements, and to briefly describe what
that requirement is.
> + if (epi) {
> + if ((is_file_epoll(epi->ffd.file))) {
> + ep_src = epi->ffd.file->private_data;
> + nests = ep_src->nests;
> + } else {
> + nests = 1;
> + }
> + }
> + spin_lock_irqsave_nested(&ep->poll_wait.lock, flags, nests);
> + ep->nests = nests + 1;
> + wake_up_locked_poll(&ep->poll_wait, EPOLLIN);
> + ep->nests = 0;
> + spin_unlock_irqrestore(&ep->poll_wait.lock, flags);
> }
next prev parent reply other threads:[~2020-02-25 0:38 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-04 20:22 [PATCH] epoll: simplify ep_poll_safewake() for CONFIG_DEBUG_LOCK_ALLOC Jason Baron
2019-09-23 15:43 ` Jason Baron
2019-09-23 19:23 ` Roman Penyaev
2019-09-24 17:34 ` Jason Baron
2019-09-24 17:52 ` Roman Penyaev
2020-01-06 19:38 ` [PATCH] fs/epoll: rework safewake " Davidlohr Bueso
2020-01-06 20:28 ` Jason Baron
2020-01-06 21:01 ` Davidlohr Bueso
2020-01-17 19:16 ` [PATCH] fs/epoll: make nesting accounting safe for -rt kernel Jason Baron
2020-02-25 0:38 ` Andrew Morton [this message]
2020-02-26 17:56 ` [PATCH v2] " Jason Baron
2020-03-17 16:34 ` Davidlohr Bueso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200224163835.08ab964483519052d7c2e39b@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=jbaron@akamai.com \
--cc=linux-kernel@vger.kernel.org \
--cc=normalperson@yhbt.net \
--cc=rpenyaev@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox