From: Eric Wong <normalperson@yhbt.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Hans Verkuil <hans.verkuil@cisco.com>,
Jiri Olsa <jolsa@redhat.com>, Jonathan Corbet <corbet@lwn.net>,
Al Viro <viro@zeniv.linux.org.uk>,
Davide Libenzi <davidel@xmailserver.org>,
Hans de Goede <hdegoede@redhat.com>,
Mauro Carvalho Chehab <mchehab@infradead.org>,
David Miller <davem@davemloft.net>,
Andrew Morton <akpm@linux-foundation.org>,
Andreas Voellmy <andreas.voellmy@yale.edu>,
"Junchang(Jason) Wang" <junchang.wang@yale.edu>,
Network Development <netdev@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD
Date: Tue, 1 Jan 2013 23:56:05 +0000 [thread overview]
Message-ID: <20130101235605.GA17168@dcvr.yhbt.net> (raw)
In-Reply-To: <CA+55aFwP3Tvmfh23KDDXhB3k=RS8pNhqasw5vuBUHw4TxrakOQ@mail.gmail.com>
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Please document the barrier that this mb() pairs with, and then give
> an explanation for the fix in the commit message, and I'll happily
> take it. Even if it's just duplicating the comments above the
> wq_has_sleeper() function, except modified for the ep_modify() case.
Hopefully my explanation is correct and makes sense below,
I think both effects of the barrier are needed
> Of course, it would be good to get verification from Jason and Andreas
> that the alternate patch also works for them.
Jason just confirmed it.
------------------------------- 8< ----------------------------
>From 02f43757d04bb6f2786e79eecf1cfa82e6574379 Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson@yhbt.net>
Date: Tue, 1 Jan 2013 21:20:27 +0000
Subject: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD
EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
ensure events are not missed. Since the modifications to the interest
mask are not protected by the same lock as ep_poll_callback, we need to
ensure the change is visible to other CPUs calling ep_poll_callback.
We also need to ensure f_op->poll() has an up-to-date view of past
events which occured before we modified the interest mask. So this
barrier also pairs with the barrier in wq_has_sleeper().
This should guarantee either ep_poll_callback or f_op->poll() (or both)
will notice the readiness of a recently-ready/modified item.
This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
http://thread.gmane.org/gmane.linux.kernel/1408782/
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Voellmy <andreas.voellmy@yale.edu>
Tested-by: "Junchang(Jason) Wang" <junchang.wang@yale.edu>
Cc: netdev@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
---
fs/eventpoll.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index cd96649..39573ee 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1285,7 +1285,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
* otherwise we might miss an event that happens between the
* f_op->poll() call and the new event set registering.
*/
- epi->event.events = event->events;
+ epi->event.events = event->events; /* need barrier below */
pt._key = event->events;
epi->event.data = event->data; /* protected by mtx */
if (epi->event.events & EPOLLWAKEUP) {
@@ -1296,6 +1296,26 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
}
/*
+ * The following barrier has two effects:
+ *
+ * 1) Flush epi changes above to other CPUs. This ensures
+ * we do not miss events from ep_poll_callback if an
+ * event occurs immediately after we call f_op->poll().
+ * We need this because we did not take ep->lock while
+ * changing epi above (but ep_poll_callback does take
+ * ep->lock).
+ *
+ * 2) We also need to ensure we do not miss _past_ events
+ * when calling f_op->poll(). This barrier also
+ * pairs with the barrier in wq_has_sleeper (see
+ * comments for wq_has_sleeper).
+ *
+ * This barrier will now guarantee ep_poll_callback or f_op->poll
+ * (or both) will notice the readiness of an item.
+ */
+ smp_mb();
+
+ /*
* Get current event bits. We can safely use the file* here because
* its usage count has been increased by the caller of this function.
*/
--
Eric Wong
next prev parent reply other threads:[~2013-01-01 23:56 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-28 1:45 ppoll() stuck on POLLIN while TCP peer is sending Eric Wong
2012-12-28 7:06 ` Eric Wong
2012-12-29 11:34 ` Eric Wong
2012-12-31 13:21 ` [PATCH] poll: prevent missed events if _qproc is NULL Eric Wong
2012-12-31 23:24 ` Eric Wong
2013-01-01 16:58 ` Junchang(Jason) Wang
2013-01-01 18:42 ` Eric Dumazet
2013-01-01 21:00 ` Eric Wong
2013-01-01 21:17 ` Eric Wong
2013-01-01 22:53 ` Linus Torvalds
2013-01-01 23:21 ` Junchang(Jason) Wang
2013-01-01 23:56 ` Eric Wong [this message]
2013-01-02 17:45 ` [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD Eric Dumazet
2013-01-02 18:40 ` Eric Wong
2013-01-02 19:03 ` Eric Dumazet
2013-01-02 19:32 ` Eric Wong
2013-01-02 22:08 ` Eric Dumazet
2013-01-02 21:16 ` Eric Wong
2013-01-02 20:08 ` ppoll() stuck on POLLIN while TCP peer is sending Eric Wong
2013-01-02 20:47 ` Eric Wong
2013-01-03 13:41 ` Eric Dumazet
2013-01-03 18:32 ` Eric Wong
2013-01-03 23:45 ` Eric Wong
2013-01-04 0:26 ` Eric Wong
2013-01-04 3:52 ` Eric Wong
2013-01-04 16:01 ` Mel Gorman
2013-01-04 17:15 ` Eric Dumazet
2013-01-04 17:59 ` Eric Wong
2013-01-05 1:07 ` Eric Wong
2013-01-06 12:07 ` Eric Wong
2013-01-07 12:25 ` Mel Gorman
2013-01-07 22:38 ` Eric Dumazet
2013-01-08 0:21 ` Eric Wong
2013-01-07 22:38 ` Eric Wong
2013-01-08 20:14 ` Eric Wong
2013-01-08 22:43 ` Mel Gorman
2013-01-08 23:23 ` Eric Wong
2013-01-09 2:14 ` Eric Dumazet
2013-01-09 2:32 ` Eric Dumazet
2013-01-09 2:54 ` Eric Dumazet
2013-01-09 3:55 ` Eric Wong
2013-01-09 8:42 ` Eric Wong
2013-01-09 8:51 ` Eric Wong
2013-01-09 13:42 ` Mel Gorman
2013-01-09 13:37 ` Mel Gorman
2013-01-09 13:50 ` Mel Gorman
2013-01-10 9:25 ` Eric Wong
2013-01-10 19:42 ` Mel Gorman
2013-01-10 20:03 ` Eric Wong
2013-01-10 20:58 ` Eric Dumazet
2013-01-11 0:51 ` Eric Wong
2013-01-11 9:30 ` Mel Gorman
2013-01-09 21:29 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130101235605.GA17168@dcvr.yhbt.net \
--to=normalperson@yhbt.net \
--cc=akpm@linux-foundation.org \
--cc=andreas.voellmy@yale.edu \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=davidel@xmailserver.org \
--cc=eric.dumazet@gmail.com \
--cc=hans.verkuil@cisco.com \
--cc=hdegoede@redhat.com \
--cc=jolsa@redhat.com \
--cc=junchang.wang@yale.edu \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@infradead.org \
--cc=netdev@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox