* recursive locking: epoll. [not found] <4E1FF63F.4040704@gmail.com> @ 2011-07-15 21:04 ` Dave Jones 2011-07-20 8:05 ` Paul Bolle 2011-07-21 11:55 ` Paul Bolle 0 siblings, 2 replies; 15+ messages in thread From: Dave Jones @ 2011-07-15 21:04 UTC (permalink / raw) To: Linux Kernel; +Cc: davidel We just had a Fedora user report this lockdep trace. ============================================= [ INFO: possible recursive locking detected ] 3.0-0.rc7.git0.1.fc16.i686 #1 --------------------------------------------- systemd-logind/651 is trying to acquire lock: (&ep->mtx){+.+.+.}, at: [<c05285f1>] ep_scan_ready_list+0x32/0x154 but task is already holding lock: (&ep->mtx){+.+.+.}, at: [<c0528a90>] sys_epoll_ctl+0x103/0x481 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&ep->mtx); lock(&ep->mtx); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by systemd-logind/651: #0: (epmutex){+.+.+.}, at: [<c0528a4b>] sys_epoll_ctl+0xbe/0x481 #1: (&ep->mtx){+.+.+.}, at: [<c0528a90>] sys_epoll_ctl+0x103/0x481 stack backtrace: Pid: 651, comm: systemd-logind Not tainted 3.0-0.rc7.git0.1.fc16.i686 #1 Call Trace: [<c08490fe>] ? printk+0x2d/0x2f [<c046b2ef>] __lock_acquire+0x811/0xb63 [<c0407c77>] ? sched_clock+0x8/0xb [<c045d190>] ? sched_clock_local+0x10/0x18b [<c05285f1>] ? ep_scan_ready_list+0x32/0x154 [<c046ba5e>] lock_acquire+0xad/0xe4 [<c05285f1>] ? ep_scan_ready_list+0x32/0x154 [<c08506bd>] __mutex_lock_common+0x49/0x2ee [<c05285f1>] ? ep_scan_ready_list+0x32/0x154 [<c04332e6>] ? __might_sleep+0x29/0xfb [<c046a912>] ? mark_lock+0x26/0x1f2 [<c0850a7c>] mutex_lock_nested+0x43/0x49 [<c05285f1>] ? ep_scan_ready_list+0x32/0x154 [<c05285f1>] ep_scan_ready_list+0x32/0x154 [<c05281cb>] ? ep_remove+0x9b/0x9b [<c0528727>] ep_poll_readyevents_proc+0x14/0x16 [<c05283d6>] ep_call_nested.constprop.2+0x6d/0x9a [<c0528713>] ? ep_scan_ready_list+0x154/0x154 [<c05284d2>] ep_eventpoll_poll+0x45/0x55 [<c0528b8c>] sys_epoll_ctl+0x1ff/0x481 [<c05282fb>] ? ep_send_events_proc+0xd5/0xd5 [<c08521ac>] syscall_call+0x7/0xb ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: recursive locking: epoll. 2011-07-15 21:04 ` recursive locking: epoll Dave Jones @ 2011-07-20 8:05 ` Paul Bolle 2011-07-21 11:55 ` Paul Bolle 1 sibling, 0 replies; 15+ messages in thread From: Paul Bolle @ 2011-07-20 8:05 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel, davidel On Fri, 2011-07-15 at 17:04 -0400, Dave Jones wrote: > We just had a Fedora user report this lockdep trace. 0) Does this have a bugzilla.redhat.com number? > ============================================= > [ INFO: possible recursive locking detected ] > 3.0-0.rc7.git0.1.fc16.i686 #1 > --------------------------------------------- > systemd-logind/651 is trying to acquire lock: > (&ep->mtx){+.+.+.}, at: [<c05285f1>] ep_scan_ready_list+0x32/0x154 > > but task is already holding lock: > (&ep->mtx){+.+.+.}, at: [<c0528a90>] sys_epoll_ctl+0x103/0x481 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&ep->mtx); > lock(&ep->mtx); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by systemd-logind/651: > #0: (epmutex){+.+.+.}, at: [<c0528a4b>] sys_epoll_ctl+0xbe/0x481 > #1: (&ep->mtx){+.+.+.}, at: [<c0528a90>] sys_epoll_ctl+0x103/0x481 > > stack backtrace: > Pid: 651, comm: systemd-logind Not tainted 3.0-0.rc7.git0.1.fc16.i686 #1 > Call Trace: > [<c08490fe>] ? printk+0x2d/0x2f > [<c046b2ef>] __lock_acquire+0x811/0xb63 > [<c0407c77>] ? sched_clock+0x8/0xb > [<c045d190>] ? sched_clock_local+0x10/0x18b > [<c05285f1>] ? ep_scan_ready_list+0x32/0x154 > [<c046ba5e>] lock_acquire+0xad/0xe4 > [<c05285f1>] ? ep_scan_ready_list+0x32/0x154 > [<c08506bd>] __mutex_lock_common+0x49/0x2ee > [<c05285f1>] ? ep_scan_ready_list+0x32/0x154 > [<c04332e6>] ? __might_sleep+0x29/0xfb > [<c046a912>] ? mark_lock+0x26/0x1f2 > [<c0850a7c>] mutex_lock_nested+0x43/0x49 > [<c05285f1>] ? ep_scan_ready_list+0x32/0x154 > [<c05285f1>] ep_scan_ready_list+0x32/0x154 > [<c05281cb>] ? ep_remove+0x9b/0x9b > [<c0528727>] ep_poll_readyevents_proc+0x14/0x16 > [<c05283d6>] ep_call_nested.constprop.2+0x6d/0x9a > [<c0528713>] ? ep_scan_ready_list+0x154/0x154 > [<c05284d2>] ep_eventpoll_poll+0x45/0x55 > [<c0528b8c>] sys_epoll_ctl+0x1ff/0x481 > [<c05282fb>] ? ep_send_events_proc+0xd5/0xd5 > [<c08521ac>] syscall_call+0x7/0xb 1) It seems I just ran into that deadlock too (on a Fedora Rawhide system, running a vanilla 3.0-0.rc7). I tried to capture it with a small digital camera, but using that camera for screenshots is apparently beyond my skills. (I could try capturing this message over a serial line, if needed.) 2) Luckily, I also hit a related warning rebooting into v2.6.39, which I could just cut and paste from dmesg's output: ============================================= [ INFO: possible recursive locking detected ] 2.6.39-0.local9.fc16.i686 #1 --------------------------------------------- systemd-logind/807 is trying to acquire lock: (&ep->mtx){+.+.+.}, at: [<c0524a05>] ep_scan_ready_list+0x32/0x154 but task is already holding lock: (&ep->mtx){+.+.+.}, at: [<c0524ea4>] sys_epoll_ctl+0x103/0x481 other info that might help us debug this: 2 locks held by systemd-logind/807: #0: (epmutex){+.+.+.}, at: [<c0524e5f>] sys_epoll_ctl+0xbe/0x481 #1: (&ep->mtx){+.+.+.}, at: [<c0524ea4>] sys_epoll_ctl+0x103/0x481 stack backtrace: Pid: 807, comm: systemd-logind Not tainted 2.6.39-0.local9.fc16.i686 #1 Call Trace: [<c080af85>] ? printk+0x2d/0x2f [<c04690b7>] __lock_acquire+0x78f/0xae1 [<c040790c>] ? sched_clock+0x8/0xb [<c045b858>] ? sched_clock_local+0x10/0x18b [<c0524a05>] ? ep_scan_ready_list+0x32/0x154 [<c046981e>] lock_acquire+0xbc/0xdc [<c0524a05>] ? ep_scan_ready_list+0x32/0x154 [<c08127f3>] __mutex_lock_common+0x4a/0x2f0 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154 [<c0432502>] ? __might_sleep+0x29/0xfb [<c0466a50>] ? trace_hardirqs_off+0xb/0xd [<c0812b4e>] mutex_lock_nested+0x39/0x3e [<c0524a05>] ? ep_scan_ready_list+0x32/0x154 [<c0524a05>] ep_scan_ready_list+0x32/0x154 [<c05245df>] ? ep_remove+0x9b/0x9b [<c0524b3b>] ep_poll_readyevents_proc+0x14/0x16 [<c05247ea>] ep_call_nested.constprop.2+0x6d/0x9a [<c0524b27>] ? ep_scan_ready_list+0x154/0x154 [<c05248e6>] ep_eventpoll_poll+0x45/0x55 [<c0524fa0>] sys_epoll_ctl+0x1ff/0x481 [<c052470f>] ? ep_send_events_proc+0xd5/0xd5 [<c0819fdf>] sysenter_do_call+0x12/0x38 3) Apparently this is something that is triggered by a brand new version of systemd (systemd-30-1.fc16.i686, compiled on July 13th, which I installed just yesterday, July 19th), as I do not recall seeing this before. 4) Feel free to prod me for more information. Paul Bolle ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: recursive locking: epoll. 2011-07-15 21:04 ` recursive locking: epoll Dave Jones 2011-07-20 8:05 ` Paul Bolle @ 2011-07-21 11:55 ` Paul Bolle 2011-07-29 18:50 ` Paul Bolle 1 sibling, 1 reply; 15+ messages in thread From: Paul Bolle @ 2011-07-21 11:55 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel, davidel On Wed, 2011-07-20 at 10:05 +0200, Paul Bolle wrote: > 0) Does this have a bugzilla.redhat.com number? That number turned out to be 722472 ( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ). Paul Bolle ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: recursive locking: epoll. 2011-07-21 11:55 ` Paul Bolle @ 2011-07-29 18:50 ` Paul Bolle 2011-07-30 18:26 ` Nelson Elhage 0 siblings, 1 reply; 15+ messages in thread From: Paul Bolle @ 2011-07-29 18:50 UTC (permalink / raw) To: Alexander Viro, linux-fsdevel, Davide Libenzi, Nelson Elhage Cc: Linux Kernel, davidel, Dave Jones (Sent to the addresses get_maintainer.pl suggested and to Davide and Nelson, because this is about code they cared about half a year ago. CC'ed to the addresses involved until now.) On Thu, 2011-07-21 at 13:55 +0200, Paul Bolle wrote: > That number turned out to be 722472 > ( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ). 0) This seems to be a lockdep false alarm. The cause is an epoll instance added to another epoll instance (ie, nesting epoll instances). Apparently lockdep isn't supplied enough information to determine what's going on here. Now there might be a number of ways to fix this. But after having looked at this for quite some time and updating the above bug report a number of times, I guessed that involving people outside those tracking that report might move things forward towards a solution. At least, I wasn't able to find a, well, clean solution. 1) The call chain triggering the warning with the nice *** DEADLOCK *** line can be summarized like this: sys_epoll_ctl mutex_lock epmutex ep_call_nested ep_loop_check_proc mutex_lock ep->mtx mutex_unlock ep->mtx mutex_lock ep->mtx ep_eventpoll_poll ep_ptable_queue_proc ep_call_nested ep_poll_readyevents_pro ep_scan_ready_list mutex_lock ep->mtx ep_read_events_proc mutex_unlock ep->mtx mutex_unlock ep->mtx mutex_unlock epmutex 2) When ep_scan_ready_list() calls mutex_lock(), lockdep notices recursive locking on ep->mtx. It is not supplied enough information to determine that the lock is related to two separate epoll instances (the outer instance and the nested instance). The solution appears to involve supplying lockdep that information (ie, "lockdep annotation"). 3) Please see the bugzilla.redhat.com report for further background. Paul Bolle ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: recursive locking: epoll. 2011-07-29 18:50 ` Paul Bolle @ 2011-07-30 18:26 ` Nelson Elhage 2011-07-30 21:25 ` Nelson Elhage 0 siblings, 1 reply; 15+ messages in thread From: Nelson Elhage @ 2011-07-30 18:26 UTC (permalink / raw) To: Paul Bolle Cc: Alexander Viro, linux-fsdevel, Davide Libenzi, Linux Kernel, Dave Jones Oof, this is kinda ugly. I *believe* that as of 22bacca4 epoll: prevent creating circular epoll structures the epoll locking is correct, with the rule that two ep->mtx's can be locked recursively iff ep1 contains ep2 (possibly indirectly), and that if ep1 contains ep2, ep1 will always be locked first. Since 22bacca4 eliminated the possibility of epoll cycles, this means there is a well-defined lock order. I *think* that for any static configuration of epoll file descriptors, we can fix the problem by doing something like using the "call_nests" parameter passed by ep_call_nested as the lock subkey, but I haven't thought this through completely. However, since that lock order is subject to change, and even reversal, at runtime, I think the following (pathological) sequence of userspace calls will trigger lockdep warnings, even though there is never any risk of deadlock: - Create epoll fds ep1 and ep2 - Add ep1 to ep2 - Do some operations that result in recursive locking - Remove ep1 from ep2 - Add ep2 to ep1 - Do some operations that result in recursive locking In fact, that program should trigger warnings even if we did the pathological thing of using the address of the 'struct eventpoll' as the subclass [1], since it is *literally the same two locks* that are getting acquired in different orders at different times. I also don't see a way to simplify the epoll locking without adding more restrictions to how the API can be used. As far as I can tell, the situation really is just that nasty. - Nelson [1] Never mind that the "subclass" is an unsigned int, so we can't even do that directly on 64-bit systems. On Fri, Jul 29, 2011 at 08:50:55PM +0200, Paul Bolle wrote: > (Sent to the addresses get_maintainer.pl suggested and to Davide and > Nelson, because this is about code they cared about half a year ago. > CC'ed to the addresses involved until now.) > > On Thu, 2011-07-21 at 13:55 +0200, Paul Bolle wrote: > > That number turned out to be 722472 > > ( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ). > > 0) This seems to be a lockdep false alarm. The cause is an epoll > instance added to another epoll instance (ie, nesting epoll instances). > Apparently lockdep isn't supplied enough information to determine what's > going on here. Now there might be a number of ways to fix this. But > after having looked at this for quite some time and updating the above > bug report a number of times, I guessed that involving people outside > those tracking that report might move things forward towards a solution. > At least, I wasn't able to find a, well, clean solution. > > 1) The call chain triggering the warning with the nice > *** DEADLOCK *** > > line can be summarized like this: > > sys_epoll_ctl > mutex_lock epmutex > ep_call_nested > ep_loop_check_proc > mutex_lock ep->mtx > mutex_unlock ep->mtx > mutex_lock ep->mtx > ep_eventpoll_poll > ep_ptable_queue_proc > ep_call_nested > ep_poll_readyevents_pro > ep_scan_ready_list > mutex_lock ep->mtx > ep_read_events_proc > mutex_unlock ep->mtx > mutex_unlock ep->mtx > mutex_unlock epmutex > > 2) When ep_scan_ready_list() calls mutex_lock(), lockdep notices > recursive locking on ep->mtx. It is not supplied enough information to > determine that the lock is related to two separate epoll instances (the > outer instance and the nested instance). The solution appears to involve > supplying lockdep that information (ie, "lockdep annotation"). > > 3) Please see the bugzilla.redhat.com report for further background. > > > Paul Bolle > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: recursive locking: epoll. 2011-07-30 18:26 ` Nelson Elhage @ 2011-07-30 21:25 ` Nelson Elhage 0 siblings, 0 replies; 15+ messages in thread From: Nelson Elhage @ 2011-07-30 21:25 UTC (permalink / raw) To: Paul Bolle Cc: Alexander Viro, linux-fsdevel, Davide Libenzi, Linux Kernel, Dave Jones On Sat, Jul 30, 2011 at 2:26 PM, Nelson Elhage <nelhage@ksplice.com> wrote: > Oof, this is kinda ugly. > > I *believe* that as of > > 22bacca4 epoll: prevent creating circular epoll structures > > the epoll locking is correct, with the rule that two ep->mtx's can be > locked recursively iff ep1 contains ep2 (possibly indirectly), and > that if ep1 contains ep2, ep1 will always be locked first. Since > 22bacca4 eliminated the possibility of epoll cycles, this means there > is a well-defined lock order. > > I *think* that for any static configuration of epoll file descriptors, > we can fix the problem by doing something like using the "call_nests" > parameter passed by ep_call_nested as the lock subkey, but I haven't > thought this through completely. > > However, since that lock order is subject to change, and even > reversal, at runtime, I think the following (pathological) sequence of > userspace calls will trigger lockdep warnings, even though there is > never any risk of deadlock: Thinking about this more, I think that the "call_nests" approach won't have this problem, since that lets the locks change subclasses exactly as necessary. Unless someone beats me to it, I'll see if I can put such a patch together. - Nelson > > - Create epoll fds ep1 and ep2 > - Add ep1 to ep2 > - Do some operations that result in recursive locking > - Remove ep1 from ep2 > - Add ep2 to ep1 > - Do some operations that result in recursive locking > > In fact, that program should trigger warnings even if we did the > pathological thing of using the address of the 'struct eventpoll' as > the subclass [1], since it is *literally the same two locks* that are > getting acquired in different orders at different times. > > I also don't see a way to simplify the epoll locking without adding > more restrictions to how the API can be used. As far as I can tell, > the situation really is just that nasty. > > - Nelson > > [1] Never mind that the "subclass" is an unsigned int, so we can't > even do that directly on 64-bit systems. > > On Fri, Jul 29, 2011 at 08:50:55PM +0200, Paul Bolle wrote: >> (Sent to the addresses get_maintainer.pl suggested and to Davide and >> Nelson, because this is about code they cared about half a year ago. >> CC'ed to the addresses involved until now.) >> >> On Thu, 2011-07-21 at 13:55 +0200, Paul Bolle wrote: >> > That number turned out to be 722472 >> > ( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ). >> >> 0) This seems to be a lockdep false alarm. The cause is an epoll >> instance added to another epoll instance (ie, nesting epoll instances). >> Apparently lockdep isn't supplied enough information to determine what's >> going on here. Now there might be a number of ways to fix this. But >> after having looked at this for quite some time and updating the above >> bug report a number of times, I guessed that involving people outside >> those tracking that report might move things forward towards a solution. >> At least, I wasn't able to find a, well, clean solution. >> >> 1) The call chain triggering the warning with the nice >> *** DEADLOCK *** >> >> line can be summarized like this: >> >> sys_epoll_ctl >> mutex_lock epmutex >> ep_call_nested >> ep_loop_check_proc >> mutex_lock ep->mtx >> mutex_unlock ep->mtx >> mutex_lock ep->mtx >> ep_eventpoll_poll >> ep_ptable_queue_proc >> ep_call_nested >> ep_poll_readyevents_pro >> ep_scan_ready_list >> mutex_lock ep->mtx >> ep_read_events_proc >> mutex_unlock ep->mtx >> mutex_unlock ep->mtx >> mutex_unlock epmutex >> >> 2) When ep_scan_ready_list() calls mutex_lock(), lockdep notices >> recursive locking on ep->mtx. It is not supplied enough information to >> determine that the lock is related to two separate epoll instances (the >> outer instance and the nested instance). The solution appears to involve >> supplying lockdep that information (ie, "lockdep annotation"). >> >> 3) Please see the bugzilla.redhat.com report for further background. >> >> >> Paul Bolle >> > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: recursive locking: epoll. @ 2011-07-30 21:25 ` Nelson Elhage 0 siblings, 0 replies; 15+ messages in thread From: Nelson Elhage @ 2011-07-30 21:25 UTC (permalink / raw) To: Paul Bolle Cc: Alexander Viro, linux-fsdevel, Davide Libenzi, Linux Kernel, Dave Jones On Sat, Jul 30, 2011 at 2:26 PM, Nelson Elhage <nelhage@ksplice.com> wrote: > Oof, this is kinda ugly. > > I *believe* that as of > > 22bacca4 epoll: prevent creating circular epoll structures > > the epoll locking is correct, with the rule that two ep->mtx's can be > locked recursively iff ep1 contains ep2 (possibly indirectly), and > that if ep1 contains ep2, ep1 will always be locked first. Since > 22bacca4 eliminated the possibility of epoll cycles, this means there > is a well-defined lock order. > > I *think* that for any static configuration of epoll file descriptors, > we can fix the problem by doing something like using the "call_nests" > parameter passed by ep_call_nested as the lock subkey, but I haven't > thought this through completely. > > However, since that lock order is subject to change, and even > reversal, at runtime, I think the following (pathological) sequence of > userspace calls will trigger lockdep warnings, even though there is > never any risk of deadlock: Thinking about this more, I think that the "call_nests" approach won't have this problem, since that lets the locks change subclasses exactly as necessary. Unless someone beats me to it, I'll see if I can put such a patch together. - Nelson > > - Create epoll fds ep1 and ep2 > - Add ep1 to ep2 > - Do some operations that result in recursive locking > - Remove ep1 from ep2 > - Add ep2 to ep1 > - Do some operations that result in recursive locking > > In fact, that program should trigger warnings even if we did the > pathological thing of using the address of the 'struct eventpoll' as > the subclass [1], since it is *literally the same two locks* that are > getting acquired in different orders at different times. > > I also don't see a way to simplify the epoll locking without adding > more restrictions to how the API can be used. As far as I can tell, > the situation really is just that nasty. > > - Nelson > > [1] Never mind that the "subclass" is an unsigned int, so we can't > even do that directly on 64-bit systems. > > On Fri, Jul 29, 2011 at 08:50:55PM +0200, Paul Bolle wrote: >> (Sent to the addresses get_maintainer.pl suggested and to Davide and >> Nelson, because this is about code they cared about half a year ago. >> CC'ed to the addresses involved until now.) >> >> On Thu, 2011-07-21 at 13:55 +0200, Paul Bolle wrote: >> > That number turned out to be 722472 >> > ( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ). >> >> 0) This seems to be a lockdep false alarm. The cause is an epoll >> instance added to another epoll instance (ie, nesting epoll instances). >> Apparently lockdep isn't supplied enough information to determine what's >> going on here. Now there might be a number of ways to fix this. But >> after having looked at this for quite some time and updating the above >> bug report a number of times, I guessed that involving people outside >> those tracking that report might move things forward towards a solution. >> At least, I wasn't able to find a, well, clean solution. >> >> 1) The call chain triggering the warning with the nice >> *** DEADLOCK *** >> >> line can be summarized like this: >> >> sys_epoll_ctl >> mutex_lock epmutex >> ep_call_nested >> ep_loop_check_proc >> mutex_lock ep->mtx >> mutex_unlock ep->mtx >> mutex_lock ep->mtx >> ep_eventpoll_poll >> ep_ptable_queue_proc >> ep_call_nested >> ep_poll_readyevents_pro >> ep_scan_ready_list >> mutex_lock ep->mtx >> ep_read_events_proc >> mutex_unlock ep->mtx >> mutex_unlock ep->mtx >> mutex_unlock epmutex >> >> 2) When ep_scan_ready_list() calls mutex_lock(), lockdep notices >> recursive locking on ep->mtx. It is not supplied enough information to >> determine that the lock is related to two separate epoll instances (the >> outer instance and the nested instance). The solution appears to involve >> supplying lockdep that information (ie, "lockdep annotation"). >> >> 3) Please see the bugzilla.redhat.com report for further background. >> >> >> Paul Bolle >> > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] epoll: Fix spurious lockdep warnings. 2011-07-30 21:25 ` Nelson Elhage (?) @ 2011-07-30 22:30 ` Nelson Elhage 2011-07-31 15:06 ` Paul Bolle ` (3 more replies) -1 siblings, 4 replies; 15+ messages in thread From: Nelson Elhage @ 2011-07-30 22:30 UTC (permalink / raw) To: Paul Bolle; +Cc: Alexander Viro, linux-fsdevel, Davide Libenzi, Nelson Elhage epoll can acquire multiple ep->mutex on multiple "struct eventpoll"s at once in the case where one epoll fd is monitoring another epoll fd. This is perfectly OK, since we're careful about the lock ordering, but causes spurious lockdep warnings. Annotate the recursion using mutex_lock_nested, and add a comment explaining the nesting rules for good measure. Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Nelson Elhage <nelhage@nelhage.com> --- I've tested this on a synthetic epoll test case, that just adds e1 to e2 and then does an epoll_wait(). I verified that it caused lockdep problems on 3.0 and that this patch fixed it, but I haven't done more extensive testing. Paul, are you able to test systemd against this? fs/eventpoll.c | 25 ++++++++++++++++++------- 1 files changed, 18 insertions(+), 7 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index f9cfd16..0cb7bc6 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -76,6 +76,15 @@ * Events that require holding "epmutex" are very rare, while for * normal operations the epoll private "ep->mtx" will guarantee * a better scalability. + * It is possible to acquire multiple "ep->mtx"es at once in the case + * when one epoll fd is added to another. In this case, we always + * acquire the locks in the order of nesting (i.e. after epoll_ctl(e1, + * EPOLL_CTL_ADD, e2), e1->mtx will always be acquired before + * e2->mtx). Since we disallow cycles of epoll file descriptors, this + * ensures that the mutexes are well-ordered. In order to communicate + * this nesting to lockdep, when walking a tree of epoll file + * descriptors, we use the current recursion depth as the lockdep + * subkey. */ /* Epoll private bits inside the event mask */ @@ -464,13 +473,15 @@ static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi) * @ep: Pointer to the epoll private data structure. * @sproc: Pointer to the scan callback. * @priv: Private opaque data passed to the @sproc callback. + * @depth: The current depth of recursive f_op->poll calls. * * Returns: The same integer error code returned by the @sproc callback. */ static int ep_scan_ready_list(struct eventpoll *ep, int (*sproc)(struct eventpoll *, struct list_head *, void *), - void *priv) + void *priv, + int depth) { int error, pwake = 0; unsigned long flags; @@ -481,7 +492,7 @@ static int ep_scan_ready_list(struct eventpoll *ep, * We need to lock this because we could be hit by * eventpoll_release_file() and epoll_ctl(). */ - mutex_lock(&ep->mtx); + mutex_lock_nested(&ep->mtx, depth); /* * Steal the ready list, and re-init the original one to the @@ -670,7 +681,7 @@ static int ep_read_events_proc(struct eventpoll *ep, struct list_head *head, static int ep_poll_readyevents_proc(void *priv, void *cookie, int call_nests) { - return ep_scan_ready_list(priv, ep_read_events_proc, NULL); + return ep_scan_ready_list(priv, ep_read_events_proc, NULL, call_nests + 1); } static unsigned int ep_eventpoll_poll(struct file *file, poll_table *wait) @@ -737,7 +748,7 @@ void eventpoll_release_file(struct file *file) ep = epi->ep; list_del_init(&epi->fllink); - mutex_lock(&ep->mtx); + mutex_lock_nested(&ep->mtx, 0); ep_remove(ep, epi); mutex_unlock(&ep->mtx); } @@ -1134,7 +1145,7 @@ static int ep_send_events(struct eventpoll *ep, esed.maxevents = maxevents; esed.events = events; - return ep_scan_ready_list(ep, ep_send_events_proc, &esed); + return ep_scan_ready_list(ep, ep_send_events_proc, &esed, 0); } static inline struct timespec ep_set_mstimeout(long ms) @@ -1267,7 +1278,7 @@ static int ep_loop_check_proc(void *priv, void *cookie, int call_nests) struct rb_node *rbp; struct epitem *epi; - mutex_lock(&ep->mtx); + mutex_lock_nested(&ep->mtx, call_nests + 1); for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) { epi = rb_entry(rbp, struct epitem, rbn); if (unlikely(is_file_epoll(epi->ffd.file))) { @@ -1409,7 +1420,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, } - mutex_lock(&ep->mtx); + mutex_lock_nested(&ep->mtx, 0); /* * Try to lookup the file inside our RB tree, Since we grabbed "mtx" -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] epoll: Fix spurious lockdep warnings. 2011-07-30 22:30 ` [PATCH] epoll: Fix spurious lockdep warnings Nelson Elhage @ 2011-07-31 15:06 ` Paul Bolle 2011-07-31 15:16 ` Nelson Elhage 2011-07-31 21:36 ` Paul Bolle ` (2 subsequent siblings) 3 siblings, 1 reply; 15+ messages in thread From: Paul Bolle @ 2011-07-31 15:06 UTC (permalink / raw) To: Nelson Elhage Cc: Alexander Viro, linux-fsdevel, Davide Libenzi, Nelson Elhage On Sat, 2011-07-30 at 18:30 -0400, Nelson Elhage wrote: > I've tested this on a synthetic epoll test case, that just adds e1 to > e2 and then does an epoll_wait(). I verified that it caused lockdep > problems on 3.0 and that this patch fixed it, but I haven't done more > extensive testing. I was unable to come up with such a test case myself. Could you perhaps share it? (Maybe that test case could even be added tot the commit message. I seem to remember an earlier commit that you were involved with which had a test case added. That helped me understand eventpoll's interface - at least enough to pinpoint the problem. Looking at a test case is much easier than grepping through a program like systemd. Issues in non-test case programs tend to increase the bug hunting challenge: one is faced with an issue in an interface one hasn't used before triggered by a program one hasn't studied before.) > Paul, are you able to test systemd against this? I hope to do so shortly (ie, in the next 24 hours). Paul Bolle ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] epoll: Fix spurious lockdep warnings. 2011-07-31 15:06 ` Paul Bolle @ 2011-07-31 15:16 ` Nelson Elhage 2011-07-31 21:39 ` Paul Bolle 0 siblings, 1 reply; 15+ messages in thread From: Nelson Elhage @ 2011-07-31 15:16 UTC (permalink / raw) To: Paul Bolle; +Cc: Alexander Viro, linux-fsdevel, Davide Libenzi Sure -- it's quite simple if you've worked with epoll before. This cut-down version is even simpler than the previous one I had, and I'd be happy to add it to the commit message. --------------------8<-------------------- #include <sys/epoll.h> int main(void) { int e1, e2; struct epoll_event evt = { .events = EPOLLIN }; e1 = epoll_create1(0); e2 = epoll_create1(0); epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt); return 0; } --------------------8<-------------------- - Nelson On Sun, Jul 31, 2011 at 05:06:10PM +0200, Paul Bolle wrote: > On Sat, 2011-07-30 at 18:30 -0400, Nelson Elhage wrote: > > I've tested this on a synthetic epoll test case, that just adds e1 to > > e2 and then does an epoll_wait(). I verified that it caused lockdep > > problems on 3.0 and that this patch fixed it, but I haven't done more > > extensive testing. > > I was unable to come up with such a test case myself. Could you perhaps > share it? > > (Maybe that test case could even be added tot the commit message. I seem > to remember an earlier commit that you were involved with which had a > test case added. That helped me understand eventpoll's interface - at > least enough to pinpoint the problem. Looking at a test case is much > easier than grepping through a program like systemd. Issues in non-test > case programs tend to increase the bug hunting challenge: one is faced > with an issue in an interface one hasn't used before triggered by a > program one hasn't studied before.) > > > Paul, are you able to test systemd against this? > > I hope to do so shortly (ie, in the next 24 hours). > > > Paul Bolle > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] epoll: Fix spurious lockdep warnings. 2011-07-31 15:16 ` Nelson Elhage @ 2011-07-31 21:39 ` Paul Bolle 0 siblings, 0 replies; 15+ messages in thread From: Paul Bolle @ 2011-07-31 21:39 UTC (permalink / raw) To: Nelson Elhage; +Cc: Alexander Viro, linux-fsdevel, Davide Libenzi On Sun, 2011-07-31 at 11:16 -0400, Nelson Elhage wrote: > Sure -- it's quite simple if you've worked with epoll before. This > cut-down version is even simpler than the previous one I had, and I'd > be happy to add it to the commit message. > > --------------------8<-------------------- > #include <sys/epoll.h> > > int main(void) { > int e1, e2; > struct epoll_event evt = { > .events = EPOLLIN > }; > > e1 = epoll_create1(0); > e2 = epoll_create1(0); > epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt); > return 0; > } > --------------------8<-------------------- That looks rather obvious. It's, well, interesting to see how I managed to miss that. Thanks, Paul Bolle ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] epoll: Fix spurious lockdep warnings. 2011-07-30 22:30 ` [PATCH] epoll: Fix spurious lockdep warnings Nelson Elhage 2011-07-31 15:06 ` Paul Bolle @ 2011-07-31 21:36 ` Paul Bolle 2011-07-31 21:48 ` Paul Bolle 2011-08-09 15:11 ` Josh Boyer 3 siblings, 0 replies; 15+ messages in thread From: Paul Bolle @ 2011-07-31 21:36 UTC (permalink / raw) To: Nelson Elhage Cc: Alexander Viro, linux-fsdevel, Davide Libenzi, Nelson Elhage On Sat, 2011-07-30 at 18:30 -0400, Nelson Elhage wrote: 0) Nit: drop the period at the end of the summary > epoll can acquire multiple ep->mutex on multiple "struct eventpoll"s 1) Ditto: it's "ep->mtx", but perhaps better something like "[...] recursive mutexes on [...]" > Paul, are you able to test systemd against this? Also works as expected with systemd-30 (which is apparently the first version that uses recursive epoll instances in systemd-logind). Thanks, Paul Bolle ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] epoll: Fix spurious lockdep warnings. 2011-07-30 22:30 ` [PATCH] epoll: Fix spurious lockdep warnings Nelson Elhage 2011-07-31 15:06 ` Paul Bolle 2011-07-31 21:36 ` Paul Bolle @ 2011-07-31 21:48 ` Paul Bolle 2011-08-09 15:11 ` Josh Boyer 3 siblings, 0 replies; 15+ messages in thread From: Paul Bolle @ 2011-07-31 21:48 UTC (permalink / raw) To: Nelson Elhage Cc: Alexander Viro, linux-fsdevel, Davide Libenzi, Nelson Elhage On Sun, 2011-07-31 at 23:36 +0200, Paul Bolle wrote: > Also works as expected with systemd-30 (which is apparently the first > version that uses recursive epoll instances in systemd-logind). I forgot to add that maybe (the final version of) this patch should be sent to "stable". Not sure for which previous releases it could be relevant, though. Paul Bolle ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] epoll: Fix spurious lockdep warnings. 2011-07-30 22:30 ` [PATCH] epoll: Fix spurious lockdep warnings Nelson Elhage ` (2 preceding siblings ...) 2011-07-31 21:48 ` Paul Bolle @ 2011-08-09 15:11 ` Josh Boyer 2011-08-09 17:36 ` Nelson Elhage 3 siblings, 1 reply; 15+ messages in thread From: Josh Boyer @ 2011-08-09 15:11 UTC (permalink / raw) To: Nelson Elhage Cc: Paul Bolle, Alexander Viro, linux-fsdevel, Davide Libenzi, Nelson Elhage On Sat, Jul 30, 2011 at 06:30:22PM -0400, Nelson Elhage wrote: > epoll can acquire multiple ep->mutex on multiple "struct eventpoll"s > at once in the case where one epoll fd is monitoring another epoll > fd. This is perfectly OK, since we're careful about the lock ordering, > but causes spurious lockdep warnings. Annotate the recursion using > mutex_lock_nested, and add a comment explaining the nesting rules for > good measure. > > Reported-by: Paul Bolle <pebolle@tiscali.nl> > Signed-off-by: Nelson Elhage <nelhage@nelhage.com> We were seeing lockdep warnings for this in rawhide until I applied the patch. Would it be possible to get it into 3.1 still? josh ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] epoll: Fix spurious lockdep warnings. 2011-08-09 15:11 ` Josh Boyer @ 2011-08-09 17:36 ` Nelson Elhage 0 siblings, 0 replies; 15+ messages in thread From: Nelson Elhage @ 2011-08-09 17:36 UTC (permalink / raw) To: Josh Boyer Cc: Nelson Elhage, Paul Bolle, Alexander Viro, linux-fsdevel, Davide Libenzi I think so. I'll resend to upstream for inclusion in 3.1 (CC stable) with Paul's comments later today. - Nelson On Tue, Aug 9, 2011 at 11:11 AM, Josh Boyer <jwboyer@redhat.com> wrote: > On Sat, Jul 30, 2011 at 06:30:22PM -0400, Nelson Elhage wrote: >> epoll can acquire multiple ep->mutex on multiple "struct eventpoll"s >> at once in the case where one epoll fd is monitoring another epoll >> fd. This is perfectly OK, since we're careful about the lock ordering, >> but causes spurious lockdep warnings. Annotate the recursion using >> mutex_lock_nested, and add a comment explaining the nesting rules for >> good measure. >> >> Reported-by: Paul Bolle <pebolle@tiscali.nl> >> Signed-off-by: Nelson Elhage <nelhage@nelhage.com> > > We were seeing lockdep warnings for this in rawhide until I applied the > patch. Would it be possible to get it into 3.1 still? > > josh > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-08-09 17:36 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4E1FF63F.4040704@gmail.com>
2011-07-15 21:04 ` recursive locking: epoll Dave Jones
2011-07-20 8:05 ` Paul Bolle
2011-07-21 11:55 ` Paul Bolle
2011-07-29 18:50 ` Paul Bolle
2011-07-30 18:26 ` Nelson Elhage
2011-07-30 21:25 ` Nelson Elhage
2011-07-30 21:25 ` Nelson Elhage
2011-07-30 22:30 ` [PATCH] epoll: Fix spurious lockdep warnings Nelson Elhage
2011-07-31 15:06 ` Paul Bolle
2011-07-31 15:16 ` Nelson Elhage
2011-07-31 21:39 ` Paul Bolle
2011-07-31 21:36 ` Paul Bolle
2011-07-31 21:48 ` Paul Bolle
2011-08-09 15:11 ` Josh Boyer
2011-08-09 17:36 ` Nelson Elhage
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.