linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Davide Libenzi <davidel@xmailserver.org>,
	Eric Dumazet <eric.dumazet@gmail.com>, Greg KH <greg@kroah.com>,
	Jason Baron <jbaron@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Roland McGrath <roland@hack.frob.com>
Cc: Eugene Teo <eugeneteo@kernel.sg>,
	Maxime Bizon <mbizon@freebox.fr>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: [PATCH 4/4] epoll: ep_unregister_pollwait() can use the freed pwq->whead
Date: Wed, 22 Feb 2012 18:35:05 +0100	[thread overview]
Message-ID: <20120222173505.GD7147@redhat.com> (raw)
In-Reply-To: <20120222173326.GA7139@redhat.com>

signalfd_cleanup() ensures that ->signalfd_wqh is not used, but
this is not enough. eppoll_entry->whead still points to the memory
we are going to free, ep_unregister_pollwait()->remove_wait_queue()
is obviously unsafe.

Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL,
change ep_unregister_pollwait() to check pwq->whead != NULL before
remove_wait_queue(). We add the new ep_remove_wait_queue() helper
for this.

However this needs more locking. ep_remove_wait_queue() should take
ep->lock first to avoid the race and pin pwq->whead, then it needs
pwq->whead->lock for __remove_wait_queue().

This can obviously AB-BA deadlock with wake_up()->ep_poll_callback(),
so ep_remove_wait_queue() does the nasty lock + trylock-or-retry dance.

Of course, this also assumes that it is safe to take ep->lock in
ep_unregister_pollwait() paths, afaics this is true.

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/eventpoll.c |   43 +++++++++++++++++++++++++++++++++++++++----
 1 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 442bedb..ac8bd15 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -320,6 +320,11 @@ static inline int ep_is_linked(struct list_head *p)
 	return !list_empty(p);
 }
 
+static inline struct eppoll_entry *ep_pwq_from_wait(wait_queue_t *p)
+{
+	return container_of(p, struct eppoll_entry, wait);
+}
+
 /* Get the "struct epitem" from a wait queue pointer */
 static inline struct epitem *ep_item_from_wait(wait_queue_t *p)
 {
@@ -467,6 +472,33 @@ static void ep_poll_safewake(wait_queue_head_t *wq)
 	put_cpu();
 }
 
+static void ep_remove_wait_queue(struct eventpoll *ep, struct eppoll_entry *pwq)
+{
+	for (;;) {
+		unsigned long flags;	/* probably unneeded */
+
+		spin_lock_irqsave(&ep->lock, flags);
+		/* can be cleared by ep_poll_callback(POLLFREE) */
+		if (!pwq->whead)
+			goto unlock;
+
+		/* _trylock to avoid the deadlock, retry if it fails */
+		if (!spin_trylock(&pwq->whead->lock))
+			goto unlock;
+
+		__remove_wait_queue(pwq->whead, &pwq->wait);
+		spin_unlock(&pwq->whead->lock);
+		pwq->whead = NULL;
+ unlock:
+		spin_unlock_irqrestore(&ep->lock, flags);
+
+		if (!pwq->whead)
+			break;
+
+		cpu_relax();
+	}
+}
+
 /*
  * This function unregisters poll callbacks from the associated file
  * descriptor.  Must be called with "mtx" held (or "epmutex" if called from
@@ -481,7 +513,7 @@ static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
 		pwq = list_first_entry(lsthead, struct eppoll_entry, llink);
 
 		list_del(&pwq->llink);
-		remove_wait_queue(pwq->whead, &pwq->wait);
+		ep_remove_wait_queue(ep, pwq);
 		kmem_cache_free(pwq_cache, pwq);
 	}
 }
@@ -844,9 +876,12 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 
 	spin_lock_irqsave(&ep->lock, flags);
 
-	/* the caller holds eppoll_entry->whead->lock */
-	if ((unsigned long)key & POLLFREE)
-		list_del_init(&wait->task_list);
+	if ((unsigned long)key & POLLFREE) {
+		struct eppoll_entry *pwq = ep_pwq_from_wait(wait);
+		/* the caller holds pwq->whead->lock */
+		__remove_wait_queue(pwq->whead, wait);
+		pwq->whead = NULL;
+	}
 
 	/*
 	 * If the event mask does not contain any poll(2) event, we consider the
-- 
1.5.5.1



  parent reply	other threads:[~2012-02-22 17:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20120222173326.GA7139@redhat.com>
2012-02-22 17:33 ` [PATCH 1/4] signalfd: introduce signalfd_cleanup() Oleg Nesterov
2012-02-22 17:34 ` [PATCH 2/4] epoll: introduce POLLFREE for ep_poll_callback() Oleg Nesterov
2012-02-22 17:34 ` [PATCH 3/4] signalfd: signalfd_cleanup() can race with remove_wait_queue() Oleg Nesterov
2012-02-22 17:35 ` Oleg Nesterov [this message]
2012-02-23 15:44   ` [PATCH 4/4] epoll: ep_unregister_pollwait() can use the freed pwq->whead Oleg Nesterov
2012-02-23 22:17     ` Linus Torvalds
2012-02-24 19:06       ` [PATCH v2 0/2] signalfd/epoll fixes Oleg Nesterov
2012-02-24 19:07         ` [PATCH v2 1/2] epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() Oleg Nesterov
2012-02-29 19:57           ` Andy Lutomirski
2012-02-29 20:06             ` Oleg Nesterov
2012-02-29 20:16               ` Andrew Lutomirski
2012-03-01 19:26                 ` Oleg Nesterov
2012-02-24 19:07         ` [PATCH v2 2/2] epoll: ep_unregister_pollwait() can use the freed pwq->whead Oleg Nesterov
2012-02-24 20:23         ` [PATCH v2 0/2] signalfd/epoll fixes Linus Torvalds
2012-02-24 23:14           ` Linus Torvalds
2012-02-25 16:08             ` Oleg Nesterov
2012-02-25 19:00               ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120222173505.GD7147@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=davidel@xmailserver.org \
    --cc=dvlasenk@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=eugeneteo@kernel.sg \
    --cc=greg@kroah.com \
    --cc=jbaron@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbizon@freebox.fr \
    --cc=roland@hack.frob.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).