linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] epoll: Fix spurious lockdep warnings
@ 2011-08-09 18:11 Nelson Elhage
  2011-09-08  0:04 ` Josh Boyer
  0 siblings, 1 reply; 13+ messages in thread
From: Nelson Elhage @ 2011-08-09 18:11 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Davide Libenzi, linux-kernel, Josh Boyer, linux-fsdevel,
	Paul Bolle, Nelson Elhage, stable

epoll can acquire recursively acquire ep->mtx on multiple "struct
eventpoll"s at once in the case where one epoll fd is monitoring
another epoll fd. This is perfectly OK, since we're careful about the
lock ordering, but it causes spurious lockdep warnings. Annotate the
recursion using mutex_lock_nested, and add a comment explaining the
nesting rules for good measure.

Recent versions of systemd are triggering this, and it can also be
demonstrated with the following trivial test program:

--------------------8<--------------------

int main(void) {
   int e1, e2;
   struct epoll_event evt = {
       .events = EPOLLIN
   };

   e1 = epoll_create1(0);
   e2 = epoll_create1(0);
   epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt);
   return 0;
}
--------------------8<--------------------

Cc: stable@kernel.org
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Tested-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Nelson Elhage <nelhage@nelhage.com>
---
 fs/eventpoll.c |   25 ++++++++++++++++++-------
 1 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f9cfd16..2acaf60 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -70,6 +70,15 @@
  * simultaneous inserts (A into B and B into A) from racing and
  * constructing a cycle without either insert observing that it is
  * going to.
+ * It is necessary to acquire multiple "ep->mtx"es at once in the
+ * case when one epoll fd is added to another. In this case, we
+ * always acquire the locks in the order of nesting (i.e. after
+ * epoll_ctl(e1, EPOLL_CTL_ADD, e2), e1->mtx will always be acquired
+ * before e2->mtx). Since we disallow cycles of epoll file
+ * descriptors, this ensures that the mutexes are well-ordered. In
+ * order to communicate this nesting to lockdep, when walking a tree
+ * of epoll file descriptors, we use the current recursion depth as
+ * the lockdep subkey.
  * It is possible to drop the "ep->mtx" and to use the global
  * mutex "epmutex" (together with "ep->lock") to have it working,
  * but having "ep->mtx" will make the interface more scalable.
@@ -464,13 +473,15 @@ static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
  * @ep: Pointer to the epoll private data structure.
  * @sproc: Pointer to the scan callback.
  * @priv: Private opaque data passed to the @sproc callback.
+ * @depth: The current depth of recursive f_op->poll calls.
  *
  * Returns: The same integer error code returned by the @sproc callback.
  */
 static int ep_scan_ready_list(struct eventpoll *ep,
 			      int (*sproc)(struct eventpoll *,
 					   struct list_head *, void *),
-			      void *priv)
+			      void *priv,
+			      int depth)
 {
 	int error, pwake = 0;
 	unsigned long flags;
@@ -481,7 +492,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
 	 * We need to lock this because we could be hit by
 	 * eventpoll_release_file() and epoll_ctl().
 	 */
-	mutex_lock(&ep->mtx);
+	mutex_lock_nested(&ep->mtx, depth);
 
 	/*
 	 * Steal the ready list, and re-init the original one to the
@@ -670,7 +681,7 @@ static int ep_read_events_proc(struct eventpoll *ep, struct list_head *head,
 
 static int ep_poll_readyevents_proc(void *priv, void *cookie, int call_nests)
 {
-	return ep_scan_ready_list(priv, ep_read_events_proc, NULL);
+	return ep_scan_ready_list(priv, ep_read_events_proc, NULL, call_nests + 1);
 }
 
 static unsigned int ep_eventpoll_poll(struct file *file, poll_table *wait)
@@ -737,7 +748,7 @@ void eventpoll_release_file(struct file *file)
 
 		ep = epi->ep;
 		list_del_init(&epi->fllink);
-		mutex_lock(&ep->mtx);
+		mutex_lock_nested(&ep->mtx, 0);
 		ep_remove(ep, epi);
 		mutex_unlock(&ep->mtx);
 	}
@@ -1134,7 +1145,7 @@ static int ep_send_events(struct eventpoll *ep,
 	esed.maxevents = maxevents;
 	esed.events = events;
 
-	return ep_scan_ready_list(ep, ep_send_events_proc, &esed);
+	return ep_scan_ready_list(ep, ep_send_events_proc, &esed, 0);
 }
 
 static inline struct timespec ep_set_mstimeout(long ms)
@@ -1267,7 +1278,7 @@ static int ep_loop_check_proc(void *priv, void *cookie, int call_nests)
 	struct rb_node *rbp;
 	struct epitem *epi;
 
-	mutex_lock(&ep->mtx);
+	mutex_lock_nested(&ep->mtx, call_nests + 1);
 	for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
 		epi = rb_entry(rbp, struct epitem, rbn);
 		if (unlikely(is_file_epoll(epi->ffd.file))) {
@@ -1409,7 +1420,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 	}
 
 
-	mutex_lock(&ep->mtx);
+	mutex_lock_nested(&ep->mtx, 0);
 
 	/*
 	 * Try to lookup the file inside our RB tree, Since we grabbed "mtx"
-- 
1.7.4.44.gf9e72

^ permalink raw reply related	[flat|nested] 13+ messages in thread
* Re: recursive locking: epoll.
@ 2011-07-30 21:25 Nelson Elhage
  2011-07-30 22:30 ` [PATCH] epoll: Fix spurious lockdep warnings Nelson Elhage
  0 siblings, 1 reply; 13+ messages in thread
From: Nelson Elhage @ 2011-07-30 21:25 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Alexander Viro, linux-fsdevel, Davide Libenzi, Linux Kernel,
	Dave Jones

On Sat, Jul 30, 2011 at 2:26 PM, Nelson Elhage <nelhage@ksplice.com> wrote:
> Oof, this is kinda ugly.
>
> I *believe* that as of
>
>  22bacca4 epoll: prevent creating circular epoll structures
>
> the epoll locking is correct, with the rule that two ep->mtx's can be
> locked recursively iff ep1 contains ep2 (possibly indirectly), and
> that if ep1 contains ep2, ep1 will always be locked first. Since
> 22bacca4 eliminated the possibility of epoll cycles, this means there
> is a well-defined lock order.
>
> I *think* that for any static configuration of epoll file descriptors,
> we can fix the problem by doing something like using the "call_nests"
> parameter passed by ep_call_nested as the lock subkey, but I haven't
> thought this through completely.
>
> However, since that lock order is subject to change, and even
> reversal, at runtime, I think the following (pathological) sequence of
> userspace calls will trigger lockdep warnings, even though there is
> never any risk of deadlock:

Thinking about this more, I think that the "call_nests" approach won't
have this problem, since that lets the locks change subclasses exactly
as necessary. Unless someone beats me to it, I'll see if I can put
such a patch together.

- Nelson

>
>  - Create epoll fds ep1 and ep2
>  - Add ep1 to ep2
>  - Do some operations that result in recursive locking
>  - Remove ep1 from ep2
>  - Add ep2 to ep1
>  - Do some operations that result in recursive locking
>
> In fact, that program should trigger warnings even if we did the
> pathological thing of using the address of the 'struct eventpoll' as
> the subclass [1], since it is *literally the same two locks* that are
> getting acquired in different orders at different times.
>
> I also don't see a way to simplify the epoll locking without adding
> more restrictions to how the API can be used. As far as I can tell,
> the situation really is just that nasty.
>
> - Nelson
>
> [1] Never mind that the "subclass" is an unsigned int, so we can't
>    even do that directly on 64-bit systems.
>
> On Fri, Jul 29, 2011 at 08:50:55PM +0200, Paul Bolle wrote:
>> (Sent to the addresses get_maintainer.pl suggested and to Davide and
>> Nelson, because this is about code they cared about half a year ago.
>> CC'ed to the addresses involved until now.)
>>
>> On Thu, 2011-07-21 at 13:55 +0200, Paul Bolle wrote:
>> > That number turned out to be 722472
>> > ( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ).
>>
>> 0) This seems to be a lockdep false alarm. The cause is an epoll
>> instance added to another epoll instance (ie, nesting epoll instances).
>> Apparently lockdep isn't supplied enough information to determine what's
>> going on here. Now there might be a number of ways to fix this. But
>> after having looked at this for quite some time and updating the above
>> bug report a number of times, I guessed that involving people outside
>> those tracking that report might move things forward towards a solution.
>> At least, I wasn't able to find a, well, clean solution.
>>
>> 1) The call chain triggering the warning with the nice
>>     *** DEADLOCK ***
>>
>> line can be summarized like this:
>>
>> sys_epoll_ctl
>>     mutex_lock                          epmutex
>>     ep_call_nested
>>         ep_loop_check_proc
>>             mutex_lock                      ep->mtx
>>             mutex_unlock                    ep->mtx
>>     mutex_lock                              ep->mtx
>>     ep_eventpoll_poll
>>         ep_ptable_queue_proc
>>         ep_call_nested
>>             ep_poll_readyevents_pro
>>                 ep_scan_ready_list
>>                     mutex_lock                  ep->mtx
>>                     ep_read_events_proc
>>                     mutex_unlock                ep->mtx
>>     mutex_unlock                            ep->mtx
>>     mutex_unlock                        epmutex
>>
>> 2) When ep_scan_ready_list() calls mutex_lock(), lockdep notices
>> recursive locking on ep->mtx. It is not supplied enough information to
>> determine that the lock is related to two separate epoll instances (the
>> outer instance and the nested instance). The solution appears to involve
>> supplying lockdep that information (ie, "lockdep annotation").
>>
>> 3) Please see the bugzilla.redhat.com report for further background.
>>
>>
>> Paul Bolle
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-09-15 15:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-09 18:11 [PATCH] epoll: Fix spurious lockdep warnings Nelson Elhage
2011-09-08  0:04 ` Josh Boyer
2011-09-13 20:22   ` Jason Baron
2011-09-13 21:16     ` Andrew Morton
2011-09-15 15:15       ` Jason Baron
  -- strict thread matches above, loose matches on Subject: below --
2011-07-30 21:25 recursive locking: epoll Nelson Elhage
2011-07-30 22:30 ` [PATCH] epoll: Fix spurious lockdep warnings Nelson Elhage
2011-07-31 15:06   ` Paul Bolle
2011-07-31 15:16     ` Nelson Elhage
2011-07-31 21:39       ` Paul Bolle
2011-07-31 21:36   ` Paul Bolle
2011-07-31 21:48   ` Paul Bolle
2011-08-09 15:11   ` Josh Boyer
2011-08-09 17:36     ` Nelson Elhage

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).