From: Andrew Morton <akpm@linux-foundation.org>
To: Jason Baron <jbaron@akamai.com>
Cc: dave@stgolabs.net, rpenyaev@suse.de,
linux-kernel@vger.kernel.org, normalperson@yhbt.net,
viro@zeniv.linux.org.uk
Subject: Re: [PATCH] fs/epoll: make nesting accounting safe for -rt kernel
Date: Mon, 24 Feb 2020 16:38:35 -0800 [thread overview]
Message-ID: <20200224163835.08ab964483519052d7c2e39b@linux-foundation.org> (raw)
In-Reply-To: <1579288607-11868-1-git-send-email-jbaron@akamai.com>
On Fri, 17 Jan 2020 14:16:47 -0500 Jason Baron <jbaron@akamai.com> wrote:
> Davidlohr Bueso pointed out that when CONFIG_DEBUG_LOCK_ALLOC is set
> ep_poll_safewake() can take several non-raw spinlocks after disabling
> pre-emption which is no no for the -rt kernel. So let's re-work how we
> determine the nesting level such that it plays nicely with -rt kernel.
"no no" isn't terribly informative, and knowledge of -rt's requirements
isn't widespread. Can we please spell this requirement out in full
detail, if only to spread the -rt education a bit?
> Let's introduce a 'nests' field in struct eventpoll that records the
> current nesting level during ep_poll_callback(). Then, if we nest again we
> can find the previous struct eventpoll that we were called from and
> increase our count by 1. The 'nests' field is protected by
> ep->poll_wait.lock.
>
> I've also moved napi_id field into a hole in struct eventpoll as part of
> introduing the nests field. This change reduces the struct eventpoll from
> 184 bytes to 176 bytes on x86_64 for the !CONFIG_DEBUG_LOCK_ALLOC
> production config.
>
> ...
>
> @@ -551,30 +557,43 @@ static int ep_call_nested(struct nested_calls *ncalls,
> */
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>
> -static DEFINE_PER_CPU(int, wakeup_nest);
> -
> -static void ep_poll_safewake(wait_queue_head_t *wq)
> +static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
> {
> + struct eventpoll *ep_src;
> unsigned long flags;
> - int subclass;
> + u8 nests = 0;
>
> - local_irq_save(flags);
> - preempt_disable();
> - subclass = __this_cpu_read(wakeup_nest);
> - spin_lock_nested(&wq->lock, subclass + 1);
> - __this_cpu_inc(wakeup_nest);
> - wake_up_locked_poll(wq, POLLIN);
> - __this_cpu_dec(wakeup_nest);
> - spin_unlock(&wq->lock);
> - local_irq_restore(flags);
> - preempt_enable();
> + /*
> + * If we are not being call from ep_poll_callback(), epi is
> + * NULL and we are at the first level of nesting, 0. Otherwise,
> + * we are being called from ep_poll_callback() and if a previous
> + * wakeup source is not an epoll file itself, we are at depth
> + * 1 since the wakeup source is depth 0. If the wakeup source
> + * is a previous epoll file in the wakeup chain then we use its
> + * nests value and record ours as nests + 1. The previous epoll
> + * file nests value is stable since its already holding its
> + * own poll_wait.lock.
> + */
Similarly, it would be helpful if this comment were to explain that
this code exists for -rt's requirements, and to briefly describe what
that requirement is.
> + if (epi) {
> + if ((is_file_epoll(epi->ffd.file))) {
> + ep_src = epi->ffd.file->private_data;
> + nests = ep_src->nests;
> + } else {
> + nests = 1;
> + }
> + }
> + spin_lock_irqsave_nested(&ep->poll_wait.lock, flags, nests);
> + ep->nests = nests + 1;
> + wake_up_locked_poll(&ep->poll_wait, EPOLLIN);
> + ep->nests = 0;
> + spin_unlock_irqrestore(&ep->poll_wait.lock, flags);
> }
next prev parent reply other threads:[~2020-02-25 0:38 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-04 20:22 [PATCH] epoll: simplify ep_poll_safewake() for CONFIG_DEBUG_LOCK_ALLOC Jason Baron
2019-09-23 15:43 ` Jason Baron
2019-09-23 19:23 ` Roman Penyaev
2019-09-24 17:34 ` Jason Baron
2019-09-24 17:52 ` Roman Penyaev
2020-01-06 19:38 ` [PATCH] fs/epoll: rework safewake " Davidlohr Bueso
2020-01-06 20:28 ` Jason Baron
2020-01-06 21:01 ` Davidlohr Bueso
2020-01-17 19:16 ` [PATCH] fs/epoll: make nesting accounting safe for -rt kernel Jason Baron
2020-02-25 0:38 ` Andrew Morton [this message]
2020-02-26 17:56 ` [PATCH v2] " Jason Baron
2020-03-17 16:34 ` Davidlohr Bueso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200224163835.08ab964483519052d7c2e39b@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=jbaron@akamai.com \
--cc=linux-kernel@vger.kernel.org \
--cc=normalperson@yhbt.net \
--cc=rpenyaev@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.