All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Wong <normalperson@yhbt.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Hans Verkuil <hans.verkuil@cisco.com>,
	Jiri Olsa <jolsa@redhat.com>, Jonathan Corbet <corbet@lwn.net>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Davide Libenzi <davidel@xmailserver.org>,
	Hans de Goede <hdegoede@redhat.com>,
	Mauro Carvalho Chehab <mchehab@infradead.org>,
	David Miller <davem@davemloft.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andreas Voellmy <andreas.voellmy@yale.edu>,
	"Junchang(Jason) Wang" <junchang.wang@yale.edu>,
	Network Development <netdev@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD
Date: Tue, 1 Jan 2013 23:56:05 +0000	[thread overview]
Message-ID: <20130101235605.GA17168@dcvr.yhbt.net> (raw)
In-Reply-To: <CA+55aFwP3Tvmfh23KDDXhB3k=RS8pNhqasw5vuBUHw4TxrakOQ@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Please document the barrier that this mb() pairs with, and then give
> an explanation for the fix in the commit message, and I'll happily
> take it. Even if it's just duplicating the comments above the
> wq_has_sleeper() function, except modified for the ep_modify() case.

Hopefully my explanation is correct and makes sense below,
I think both effects of the barrier are needed

> Of course, it would be good to get verification from Jason and Andreas
> that the alternate patch also works for them.

Jason just confirmed it.

------------------------------- 8< ----------------------------
>From 02f43757d04bb6f2786e79eecf1cfa82e6574379 Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson@yhbt.net>
Date: Tue, 1 Jan 2013 21:20:27 +0000
Subject: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD

EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
ensure events are not missed.  Since the modifications to the interest
mask are not protected by the same lock as ep_poll_callback, we need to
ensure the change is visible to other CPUs calling ep_poll_callback.

We also need to ensure f_op->poll() has an up-to-date view of past
events which occured before we modified the interest mask.  So this
barrier also pairs with the barrier in wq_has_sleeper().

This should guarantee either ep_poll_callback or f_op->poll() (or both)
will notice the readiness of a recently-ready/modified item.

This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
http://thread.gmane.org/gmane.linux.kernel/1408782/

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Voellmy <andreas.voellmy@yale.edu>
Tested-by: "Junchang(Jason) Wang" <junchang.wang@yale.edu>
Cc: netdev@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
---
 fs/eventpoll.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index cd96649..39573ee 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1285,7 +1285,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
 	 * otherwise we might miss an event that happens between the
 	 * f_op->poll() call and the new event set registering.
 	 */
-	epi->event.events = event->events;
+	epi->event.events = event->events; /* need barrier below */
 	pt._key = event->events;
 	epi->event.data = event->data; /* protected by mtx */
 	if (epi->event.events & EPOLLWAKEUP) {
@@ -1296,6 +1296,26 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
 	}
 
 	/*
+	 * The following barrier has two effects:
+	 *
+	 * 1) Flush epi changes above to other CPUs.  This ensures
+	 *    we do not miss events from ep_poll_callback if an
+	 *    event occurs immediately after we call f_op->poll().
+	 *    We need this because we did not take ep->lock while
+	 *    changing epi above (but ep_poll_callback does take
+	 *    ep->lock).
+	 *
+	 * 2) We also need to ensure we do not miss _past_ events
+	 *    when calling f_op->poll().  This barrier also
+	 *    pairs with the barrier in wq_has_sleeper (see
+	 *    comments for wq_has_sleeper).
+	 *
+	 * This barrier will now guarantee ep_poll_callback or f_op->poll
+	 * (or both) will notice the readiness of an item.
+	 */
+	smp_mb();
+
+	/*
 	 * Get current event bits. We can safely use the file* here because
 	 * its usage count has been increased by the caller of this function.
 	 */
-- 
Eric Wong

  parent reply	other threads:[~2013-01-01 23:56 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-28  1:45 ppoll() stuck on POLLIN while TCP peer is sending Eric Wong
2012-12-28  7:06 ` Eric Wong
2012-12-29 11:34   ` Eric Wong
2012-12-31 13:21 ` [PATCH] poll: prevent missed events if _qproc is NULL Eric Wong
2012-12-31 23:24   ` Eric Wong
2013-01-01 16:58     ` Junchang(Jason) Wang
2013-01-01 18:42   ` Eric Dumazet
2013-01-01 21:00     ` Eric Wong
2013-01-01 21:17       ` Eric Wong
2013-01-01 22:53         ` Linus Torvalds
2013-01-01 23:21           ` Junchang(Jason) Wang
2013-01-01 23:56           ` Eric Wong [this message]
2013-01-02 17:45             ` [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD Eric Dumazet
2013-01-02 18:40               ` Eric Wong
2013-01-02 19:03                 ` Eric Dumazet
2013-01-02 19:32                   ` Eric Wong
2013-01-02 22:08                     ` Eric Dumazet
2013-01-02 21:16             ` Eric Wong
2013-01-02 20:08 ` ppoll() stuck on POLLIN while TCP peer is sending Eric Wong
2013-01-02 20:08   ` Eric Wong
2013-01-02 20:47   ` Eric Wong
2013-01-02 20:47     ` Eric Wong
2013-01-03 13:41     ` Eric Dumazet
2013-01-03 13:41       ` Eric Dumazet
2013-01-03 18:32       ` Eric Wong
2013-01-03 18:32         ` Eric Wong
2013-01-03 23:45         ` Eric Wong
2013-01-03 23:45           ` Eric Wong
2013-01-04  0:26           ` Eric Wong
2013-01-04  0:26             ` Eric Wong
2013-01-04  3:52             ` Eric Wong
2013-01-04  3:52               ` Eric Wong
2013-01-04 16:01   ` Mel Gorman
2013-01-04 16:01     ` Mel Gorman
2013-01-04 17:15     ` Eric Dumazet
2013-01-04 17:15       ` Eric Dumazet
2013-01-04 17:59     ` Eric Wong
2013-01-04 17:59       ` Eric Wong
2013-01-05  1:07     ` Eric Wong
2013-01-05  1:07       ` Eric Wong
2013-01-06 12:07     ` Eric Wong
2013-01-06 12:07       ` Eric Wong
2013-01-07 12:25       ` Mel Gorman
2013-01-07 12:25         ` Mel Gorman
2013-01-07 22:38         ` Eric Dumazet
2013-01-07 22:38           ` Eric Dumazet
2013-01-08  0:21           ` Eric Wong
2013-01-08  0:21             ` Eric Wong
2013-01-07 22:38         ` Eric Wong
2013-01-07 22:38           ` Eric Wong
2013-01-08 20:14           ` Eric Wong
2013-01-08 20:14             ` Eric Wong
2013-01-08 22:43           ` Mel Gorman
2013-01-08 22:43             ` Mel Gorman
2013-01-08 23:23             ` Eric Wong
2013-01-08 23:23               ` Eric Wong
2013-01-09  2:14               ` Eric Dumazet
2013-01-09  2:14                 ` Eric Dumazet
2013-01-09  2:32                 ` Eric Dumazet
2013-01-09  2:32                   ` Eric Dumazet
2013-01-09  2:54                   ` Eric Dumazet
2013-01-09  2:54                     ` Eric Dumazet
2013-01-09  3:55                     ` Eric Wong
2013-01-09  3:55                       ` Eric Wong
2013-01-09  8:42                       ` Eric Wong
2013-01-09  8:42                         ` Eric Wong
2013-01-09  8:51                         ` Eric Wong
2013-01-09  8:51                           ` Eric Wong
2013-01-09 13:42                   ` Mel Gorman
2013-01-09 13:42                     ` Mel Gorman
2013-01-09 13:37               ` Mel Gorman
2013-01-09 13:37                 ` Mel Gorman
2013-01-09 13:50                 ` Mel Gorman
2013-01-09 13:50                   ` Mel Gorman
2013-01-10  9:25                 ` Eric Wong
2013-01-10  9:25                   ` Eric Wong
2013-01-10 19:42                   ` Mel Gorman
2013-01-10 19:42                     ` Mel Gorman
2013-01-10 20:03                     ` Eric Wong
2013-01-10 20:03                       ` Eric Wong
2013-01-10 20:58                     ` Eric Dumazet
2013-01-10 20:58                       ` Eric Dumazet
2013-01-11  0:51                     ` Eric Wong
2013-01-11  0:51                       ` Eric Wong
2013-01-11  9:30                       ` Mel Gorman
2013-01-11  9:30                         ` Mel Gorman
2013-01-09 21:29             ` Eric Wong
2013-01-09 21:29               ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130101235605.GA17168@dcvr.yhbt.net \
    --to=normalperson@yhbt.net \
    --cc=akpm@linux-foundation.org \
    --cc=andreas.voellmy@yale.edu \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=davidel@xmailserver.org \
    --cc=eric.dumazet@gmail.com \
    --cc=hans.verkuil@cisco.com \
    --cc=hdegoede@redhat.com \
    --cc=jolsa@redhat.com \
    --cc=junchang.wang@yale.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@infradead.org \
    --cc=netdev@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.