From: Roman Penyaev <rpenyaev@suse.de>
To: Jason Baron <jbaron@akamai.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/1] epoll: use rwlock in order to reduce ep_poll_callback() contention
Date: Wed, 05 Dec 2018 12:16:46 +0100 [thread overview]
Message-ID: <38cc83144a2ec332dead4e21214ea068@suse.de> (raw)
In-Reply-To: <45bce871-edfd-c402-acde-2e57e80cc522@akamai.com>
On 2018-12-04 18:23, Jason Baron wrote:
> On 12/3/18 6:02 AM, Roman Penyaev wrote:
[...]
>>
>> ep_set_busy_poll_napi_id(epi);
>>
>> @@ -1156,8 +1187,8 @@ static int ep_poll_callback(wait_queue_entry_t
>> *wait, unsigned mode, int sync, v
>> */
>> if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
>> if (epi->next == EP_UNACTIVE_PTR) {
>> - epi->next = ep->ovflist;
>> - ep->ovflist = epi;
>> + /* Atomically exchange tail */
>> + epi->next = xchg(&ep->ovflist, epi);
>
> This also relies on the fact that the same epi can't be added to the
> list in parallel, because the wait queue doing the wakeup will have the
> wait_queue_head lock. That was a little confusing for me b/c we only
> had
> the read_lock() above.
Yes, that is indeed not obvious path, but wq is locked by
wake_up_*_poll()
call or caller of wake_up_locked_poll() has to be sure wq.lock is taken.
I'll add an explicit comment here, thanks for pointing out.
>
>> if (epi->ws) {
>> /*
>> * Activate ep->ws since epi->ws may get
>> @@ -1172,7 +1203,7 @@ static int ep_poll_callback(wait_queue_entry_t
>> *wait, unsigned mode, int sync, v
>>
>> /* If this file is already in the ready list we exit soon */
>> if (!ep_is_linked(epi)) {
>> - list_add_tail(&epi->rdllink, &ep->rdllist);
>> + list_add_tail_lockless(&epi->rdllink, &ep->rdllist);
>> ep_pm_stay_awake_rcu(epi);
>> }
>
> same for this.
... and an explicit comment here.
>
>>
>> @@ -1197,13 +1228,13 @@ static int ep_poll_callback(wait_queue_entry_t
>> *wait, unsigned mode, int sync, v
>> break;
>> }
>> }
>> - wake_up_locked(&ep->wq);
>> + wake_up(&ep->wq);
>
> why the switch here to the locked() variant? Shouldn't the 'reader'
> side, in this case which takes the rwlock for write see all updates in
> a
> coherent state at this point?
lockdep inside __wake_up_common expects wq_head->lock is taken, and
seems this is not a good idea to leave wq naked on wake up path,
when several CPUs can enter wake function. Although
default_wake_function
is protected by spinlock inside try_to_wake_up(), but for example
autoremove_wake_function() can't be called concurrently for the same wq
(it removes wq entry from the list). Also in case of bookmarks
__wake_up_common adds an entry to the list, thus can't be called without
any locks.
I understand you concern and you are right saying that read side sees
wq entries as stable, but that will work only if __wake_up_common does
not modify anything, that is seems true now, but of course it is
too scary to rely on that in the future.
>
>> }
>> if (waitqueue_active(&ep->poll_wait))
>> pwake++;
>>
>> out_unlock:
>> - spin_unlock_irqrestore(&ep->wq.lock, flags);
>> + read_unlock_irqrestore(&ep->lock, flags);
>>
>> /* We have to call this outside the lock */
>> if (pwake)
>> @@ -1489,7 +1520,7 @@ static int ep_insert(struct eventpoll *ep, const
>> struct epoll_event *event,
>> goto error_remove_epi;
>>
>> /* We have to drop the new item inside our item list to keep track
>> of it */
>> - spin_lock_irq(&ep->wq.lock);
>> + write_lock_irq(&ep->lock);
>>
>> /* record NAPI ID of new item if present */
>> ep_set_busy_poll_napi_id(epi);
>> @@ -1501,12 +1532,12 @@ static int ep_insert(struct eventpoll *ep,
>> const struct epoll_event *event,
>>
>> /* Notify waiting tasks that events are available */
>> if (waitqueue_active(&ep->wq))
>> - wake_up_locked(&ep->wq);
>> + wake_up(&ep->wq);
>
> is this necessary to switch as well? Is this to make lockdep happy?
> Looks like there are few more conversions too below...
Yes, necessary, wq.lock should be taken.
--
Roman
next prev parent reply other threads:[~2018-12-05 11:16 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-03 11:02 [RFC PATCH 1/1] epoll: use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
2018-12-03 17:34 ` Linus Torvalds
2018-12-04 11:50 ` Roman Penyaev
2018-12-04 23:59 ` Andrea Parri
2018-12-05 11:25 ` Roman Penyaev
2018-12-04 17:23 ` Jason Baron
2018-12-04 19:02 ` Paul E. McKenney
2018-12-05 11:22 ` Roman Penyaev
2018-12-05 11:16 ` Roman Penyaev [this message]
2018-12-05 16:38 ` Jason Baron
2018-12-05 20:11 ` Roman Penyaev
2018-12-06 1:54 ` Davidlohr Bueso
2018-12-06 3:08 ` Davidlohr Bueso
2018-12-06 10:27 ` Roman Penyaev
2018-12-06 4:04 ` Davidlohr Bueso
2018-12-06 10:25 ` Roman Penyaev
2018-12-05 23:46 ` Eric Wong
2018-12-06 10:52 ` Roman Penyaev
2018-12-06 20:35 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=38cc83144a2ec332dead4e21214ea068@suse.de \
--to=rpenyaev@suse.de \
--cc=jbaron@akamai.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.