recursive locking: epoll.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* recursive locking: epoll.
       [not found] <4E1FF63F.4040704@gmail.com>
@ 2011-07-15 21:04 ` Dave Jones
  2011-07-20  8:05   ` Paul Bolle
  2011-07-21 11:55   ` Paul Bolle
  0 siblings, 2 replies; 6+ messages in thread
From: Dave Jones @ 2011-07-15 21:04 UTC (permalink / raw)
  To: Linux Kernel; +Cc: davidel

We just had a Fedora user report this lockdep trace.

  =============================================
  [ INFO: possible recursive locking detected ]
  3.0-0.rc7.git0.1.fc16.i686 #1
  ---------------------------------------------
  systemd-logind/651 is trying to acquire lock:
   (&ep->mtx){+.+.+.}, at: [<c05285f1>] ep_scan_ready_list+0x32/0x154
 
  but task is already holding lock:
   (&ep->mtx){+.+.+.}, at: [<c0528a90>] sys_epoll_ctl+0x103/0x481
 
  other info that might help us debug this:
   Possible unsafe locking scenario:
 
         CPU0
         ----
    lock(&ep->mtx);
    lock(&ep->mtx);
 
   *** DEADLOCK ***
 
   May be due to missing lock nesting notation
 
  2 locks held by systemd-logind/651:
   #0:  (epmutex){+.+.+.}, at: [<c0528a4b>] sys_epoll_ctl+0xbe/0x481
   #1:  (&ep->mtx){+.+.+.}, at: [<c0528a90>] sys_epoll_ctl+0x103/0x481
 
  stack backtrace:
  Pid: 651, comm: systemd-logind Not tainted 3.0-0.rc7.git0.1.fc16.i686 #1
  Call Trace:
   [<c08490fe>] ? printk+0x2d/0x2f
   [<c046b2ef>] __lock_acquire+0x811/0xb63
   [<c0407c77>] ? sched_clock+0x8/0xb
   [<c045d190>] ? sched_clock_local+0x10/0x18b
   [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
   [<c046ba5e>] lock_acquire+0xad/0xe4
   [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
   [<c08506bd>] __mutex_lock_common+0x49/0x2ee
   [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
   [<c04332e6>] ? __might_sleep+0x29/0xfb
   [<c046a912>] ? mark_lock+0x26/0x1f2
   [<c0850a7c>] mutex_lock_nested+0x43/0x49
   [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
   [<c05285f1>] ep_scan_ready_list+0x32/0x154
   [<c05281cb>] ? ep_remove+0x9b/0x9b
   [<c0528727>] ep_poll_readyevents_proc+0x14/0x16
   [<c05283d6>] ep_call_nested.constprop.2+0x6d/0x9a
   [<c0528713>] ? ep_scan_ready_list+0x154/0x154
   [<c05284d2>] ep_eventpoll_poll+0x45/0x55
   [<c0528b8c>] sys_epoll_ctl+0x1ff/0x481
   [<c05282fb>] ? ep_send_events_proc+0xd5/0xd5
   [<c08521ac>] syscall_call+0x7/0xb


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: recursive locking: epoll.
  2011-07-15 21:04 ` recursive locking: epoll Dave Jones
@ 2011-07-20  8:05   ` Paul Bolle
  2011-07-21 11:55   ` Paul Bolle
  1 sibling, 0 replies; 6+ messages in thread
From: Paul Bolle @ 2011-07-20  8:05 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linux Kernel, davidel

On Fri, 2011-07-15 at 17:04 -0400, Dave Jones wrote:
> We just had a Fedora user report this lockdep trace.

0) Does this have a bugzilla.redhat.com number?

>   =============================================
>   [ INFO: possible recursive locking detected ]
>   3.0-0.rc7.git0.1.fc16.i686 #1
>   ---------------------------------------------
>   systemd-logind/651 is trying to acquire lock:
>    (&ep->mtx){+.+.+.}, at: [<c05285f1>] ep_scan_ready_list+0x32/0x154
>  
>   but task is already holding lock:
>    (&ep->mtx){+.+.+.}, at: [<c0528a90>] sys_epoll_ctl+0x103/0x481
>  
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
>  
>          CPU0
>          ----
>     lock(&ep->mtx);
>     lock(&ep->mtx);
>  
>    *** DEADLOCK ***
>  
>    May be due to missing lock nesting notation
>  
>   2 locks held by systemd-logind/651:
>    #0:  (epmutex){+.+.+.}, at: [<c0528a4b>] sys_epoll_ctl+0xbe/0x481
>    #1:  (&ep->mtx){+.+.+.}, at: [<c0528a90>] sys_epoll_ctl+0x103/0x481
>  
>   stack backtrace:
>   Pid: 651, comm: systemd-logind Not tainted 3.0-0.rc7.git0.1.fc16.i686 #1
>   Call Trace:
>    [<c08490fe>] ? printk+0x2d/0x2f
>    [<c046b2ef>] __lock_acquire+0x811/0xb63
>    [<c0407c77>] ? sched_clock+0x8/0xb
>    [<c045d190>] ? sched_clock_local+0x10/0x18b
>    [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
>    [<c046ba5e>] lock_acquire+0xad/0xe4
>    [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
>    [<c08506bd>] __mutex_lock_common+0x49/0x2ee
>    [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
>    [<c04332e6>] ? __might_sleep+0x29/0xfb
>    [<c046a912>] ? mark_lock+0x26/0x1f2
>    [<c0850a7c>] mutex_lock_nested+0x43/0x49
>    [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
>    [<c05285f1>] ep_scan_ready_list+0x32/0x154
>    [<c05281cb>] ? ep_remove+0x9b/0x9b
>    [<c0528727>] ep_poll_readyevents_proc+0x14/0x16
>    [<c05283d6>] ep_call_nested.constprop.2+0x6d/0x9a
>    [<c0528713>] ? ep_scan_ready_list+0x154/0x154
>    [<c05284d2>] ep_eventpoll_poll+0x45/0x55
>    [<c0528b8c>] sys_epoll_ctl+0x1ff/0x481
>    [<c05282fb>] ? ep_send_events_proc+0xd5/0xd5
>    [<c08521ac>] syscall_call+0x7/0xb

1) It seems I just ran into that deadlock too (on a Fedora Rawhide
system, running a vanilla 3.0-0.rc7). I tried to capture it with a small
digital camera, but using that camera for screenshots is apparently
beyond my skills. (I could try capturing this message over a serial
line, if needed.)

2) Luckily, I also hit a related warning rebooting into v2.6.39, which I
could just cut and paste from dmesg's output:

=============================================
[ INFO: possible recursive locking detected ]
2.6.39-0.local9.fc16.i686 #1
---------------------------------------------
systemd-logind/807 is trying to acquire lock:
 (&ep->mtx){+.+.+.}, at: [<c0524a05>] ep_scan_ready_list+0x32/0x154

but task is already holding lock:
 (&ep->mtx){+.+.+.}, at: [<c0524ea4>] sys_epoll_ctl+0x103/0x481

other info that might help us debug this:
2 locks held by systemd-logind/807:
 #0:  (epmutex){+.+.+.}, at: [<c0524e5f>] sys_epoll_ctl+0xbe/0x481
 #1:  (&ep->mtx){+.+.+.}, at: [<c0524ea4>] sys_epoll_ctl+0x103/0x481

stack backtrace:
Pid: 807, comm: systemd-logind Not tainted 2.6.39-0.local9.fc16.i686 #1
Call Trace:
 [<c080af85>] ? printk+0x2d/0x2f
 [<c04690b7>] __lock_acquire+0x78f/0xae1
 [<c040790c>] ? sched_clock+0x8/0xb
 [<c045b858>] ? sched_clock_local+0x10/0x18b
 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154
 [<c046981e>] lock_acquire+0xbc/0xdc
 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154
 [<c08127f3>] __mutex_lock_common+0x4a/0x2f0
 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154
 [<c0432502>] ? __might_sleep+0x29/0xfb
 [<c0466a50>] ? trace_hardirqs_off+0xb/0xd
 [<c0812b4e>] mutex_lock_nested+0x39/0x3e
 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154
 [<c0524a05>] ep_scan_ready_list+0x32/0x154
 [<c05245df>] ? ep_remove+0x9b/0x9b
 [<c0524b3b>] ep_poll_readyevents_proc+0x14/0x16
 [<c05247ea>] ep_call_nested.constprop.2+0x6d/0x9a
 [<c0524b27>] ? ep_scan_ready_list+0x154/0x154
 [<c05248e6>] ep_eventpoll_poll+0x45/0x55
 [<c0524fa0>] sys_epoll_ctl+0x1ff/0x481
 [<c052470f>] ? ep_send_events_proc+0xd5/0xd5
 [<c0819fdf>] sysenter_do_call+0x12/0x38

3) Apparently this is something that is triggered by a brand new version
of systemd (systemd-30-1.fc16.i686, compiled on July 13th, which I
installed just yesterday, July 19th), as I do not recall seeing this
before.

4) Feel free to prod me for more information. 


Paul Bolle


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: recursive locking: epoll.
  2011-07-15 21:04 ` recursive locking: epoll Dave Jones
  2011-07-20  8:05   ` Paul Bolle
@ 2011-07-21 11:55   ` Paul Bolle
  2011-07-29 18:50     ` Paul Bolle
  1 sibling, 1 reply; 6+ messages in thread
From: Paul Bolle @ 2011-07-21 11:55 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linux Kernel, davidel

On Wed, 2011-07-20 at 10:05 +0200, Paul Bolle wrote:
> 0) Does this have a bugzilla.redhat.com number?

That number turned out to be 722472
( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ).


Paul Bolle


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: recursive locking: epoll.
  2011-07-21 11:55   ` Paul Bolle
@ 2011-07-29 18:50     ` Paul Bolle
  2011-07-30 18:26       ` Nelson Elhage
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Bolle @ 2011-07-29 18:50 UTC (permalink / raw)
  To: Alexander Viro, linux-fsdevel, Davide Libenzi, Nelson Elhage
  Cc: Linux Kernel, davidel, Dave Jones

(Sent to the addresses get_maintainer.pl suggested and to Davide and
Nelson, because this is about code they cared about half a year ago.
CC'ed to the addresses involved until now.)

On Thu, 2011-07-21 at 13:55 +0200, Paul Bolle wrote:
> That number turned out to be 722472
> ( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ).

0) This seems to be a lockdep false alarm. The cause is an epoll
instance added to another epoll instance (ie, nesting epoll instances).
Apparently lockdep isn't supplied enough information to determine what's
going on here. Now there might be a number of ways to fix this. But
after having looked at this for quite some time and updating the above
bug report a number of times, I guessed that involving people outside
those tracking that report might move things forward towards a solution.
At least, I wasn't able to find a, well, clean solution.

1) The call chain triggering the warning with the nice
    *** DEADLOCK ***

line can be summarized like this:

sys_epoll_ctl
    mutex_lock                          epmutex
    ep_call_nested
        ep_loop_check_proc
            mutex_lock                      ep->mtx
            mutex_unlock                    ep->mtx
    mutex_lock                              ep->mtx
    ep_eventpoll_poll
        ep_ptable_queue_proc
        ep_call_nested
            ep_poll_readyevents_pro
                ep_scan_ready_list
                    mutex_lock                  ep->mtx
                    ep_read_events_proc
                    mutex_unlock                ep->mtx
    mutex_unlock                            ep->mtx
    mutex_unlock                        epmutex

2) When ep_scan_ready_list() calls mutex_lock(), lockdep notices
recursive locking on ep->mtx. It is not supplied enough information to
determine that the lock is related to two separate epoll instances (the
outer instance and the nested instance). The solution appears to involve
supplying lockdep that information (ie, "lockdep annotation"). 

3) Please see the bugzilla.redhat.com report for further background.

Paul Bolle

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: recursive locking: epoll.
  2011-07-29 18:50     ` Paul Bolle
@ 2011-07-30 18:26       ` Nelson Elhage
  2011-07-30 21:25         ` Nelson Elhage
  0 siblings, 1 reply; 6+ messages in thread
From: Nelson Elhage @ 2011-07-30 18:26 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Alexander Viro, linux-fsdevel, Davide Libenzi, Linux Kernel,
	Dave Jones

Oof, this is kinda ugly.

I *believe* that as of

 22bacca4 epoll: prevent creating circular epoll structures

the epoll locking is correct, with the rule that two ep->mtx's can be
locked recursively iff ep1 contains ep2 (possibly indirectly), and
that if ep1 contains ep2, ep1 will always be locked first. Since
22bacca4 eliminated the possibility of epoll cycles, this means there
is a well-defined lock order.

I *think* that for any static configuration of epoll file descriptors,
we can fix the problem by doing something like using the "call_nests"
parameter passed by ep_call_nested as the lock subkey, but I haven't
thought this through completely.

However, since that lock order is subject to change, and even
reversal, at runtime, I think the following (pathological) sequence of
userspace calls will trigger lockdep warnings, even though there is
never any risk of deadlock:

 - Create epoll fds ep1 and ep2
 - Add ep1 to ep2
 - Do some operations that result in recursive locking
 - Remove ep1 from ep2
 - Add ep2 to ep1
 - Do some operations that result in recursive locking

In fact, that program should trigger warnings even if we did the
pathological thing of using the address of the 'struct eventpoll' as
the subclass [1], since it is *literally the same two locks* that are
getting acquired in different orders at different times.

I also don't see a way to simplify the epoll locking without adding
more restrictions to how the API can be used. As far as I can tell,
the situation really is just that nasty.

- Nelson

[1] Never mind that the "subclass" is an unsigned int, so we can't
    even do that directly on 64-bit systems.

On Fri, Jul 29, 2011 at 08:50:55PM +0200, Paul Bolle wrote:
> (Sent to the addresses get_maintainer.pl suggested and to Davide and
> Nelson, because this is about code they cared about half a year ago.
> CC'ed to the addresses involved until now.)
> 
> On Thu, 2011-07-21 at 13:55 +0200, Paul Bolle wrote:
> > That number turned out to be 722472
> > ( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ).
> 
> 0) This seems to be a lockdep false alarm. The cause is an epoll
> instance added to another epoll instance (ie, nesting epoll instances).
> Apparently lockdep isn't supplied enough information to determine what's
> going on here. Now there might be a number of ways to fix this. But
> after having looked at this for quite some time and updating the above
> bug report a number of times, I guessed that involving people outside
> those tracking that report might move things forward towards a solution.
> At least, I wasn't able to find a, well, clean solution.
> 
> 1) The call chain triggering the warning with the nice
>     *** DEADLOCK ***
> 
> line can be summarized like this:
> 
> sys_epoll_ctl
>     mutex_lock                          epmutex
>     ep_call_nested
>         ep_loop_check_proc
>             mutex_lock                      ep->mtx
>             mutex_unlock                    ep->mtx
>     mutex_lock                              ep->mtx
>     ep_eventpoll_poll
>         ep_ptable_queue_proc
>         ep_call_nested
>             ep_poll_readyevents_pro
>                 ep_scan_ready_list
>                     mutex_lock                  ep->mtx
>                     ep_read_events_proc
>                     mutex_unlock                ep->mtx
>     mutex_unlock                            ep->mtx
>     mutex_unlock                        epmutex
> 
> 2) When ep_scan_ready_list() calls mutex_lock(), lockdep notices
> recursive locking on ep->mtx. It is not supplied enough information to
> determine that the lock is related to two separate epoll instances (the
> outer instance and the nested instance). The solution appears to involve
> supplying lockdep that information (ie, "lockdep annotation"). 
> 
> 3) Please see the bugzilla.redhat.com report for further background.
> 
> 
> Paul Bolle
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: recursive locking: epoll.
  2011-07-30 18:26       ` Nelson Elhage
@ 2011-07-30 21:25         ` Nelson Elhage
  0 siblings, 0 replies; 6+ messages in thread
From: Nelson Elhage @ 2011-07-30 21:25 UTC (permalink / raw)
  To: Paul Bolle
  Cc: Alexander Viro, linux-fsdevel, Davide Libenzi, Linux Kernel,
	Dave Jones

On Sat, Jul 30, 2011 at 2:26 PM, Nelson Elhage <nelhage@ksplice.com> wrote:
> Oof, this is kinda ugly.
>
> I *believe* that as of
>
>  22bacca4 epoll: prevent creating circular epoll structures
>
> the epoll locking is correct, with the rule that two ep->mtx's can be
> locked recursively iff ep1 contains ep2 (possibly indirectly), and
> that if ep1 contains ep2, ep1 will always be locked first. Since
> 22bacca4 eliminated the possibility of epoll cycles, this means there
> is a well-defined lock order.
>
> I *think* that for any static configuration of epoll file descriptors,
> we can fix the problem by doing something like using the "call_nests"
> parameter passed by ep_call_nested as the lock subkey, but I haven't
> thought this through completely.
>
> However, since that lock order is subject to change, and even
> reversal, at runtime, I think the following (pathological) sequence of
> userspace calls will trigger lockdep warnings, even though there is
> never any risk of deadlock:

Thinking about this more, I think that the "call_nests" approach won't
have this problem, since that lets the locks change subclasses exactly
as necessary. Unless someone beats me to it, I'll see if I can put
such a patch together.

- Nelson

>
>  - Create epoll fds ep1 and ep2
>  - Add ep1 to ep2
>  - Do some operations that result in recursive locking
>  - Remove ep1 from ep2
>  - Add ep2 to ep1
>  - Do some operations that result in recursive locking
>
> In fact, that program should trigger warnings even if we did the
> pathological thing of using the address of the 'struct eventpoll' as
> the subclass [1], since it is *literally the same two locks* that are
> getting acquired in different orders at different times.
>
> I also don't see a way to simplify the epoll locking without adding
> more restrictions to how the API can be used. As far as I can tell,
> the situation really is just that nasty.
>
> - Nelson
>
> [1] Never mind that the "subclass" is an unsigned int, so we can't
>    even do that directly on 64-bit systems.
>
> On Fri, Jul 29, 2011 at 08:50:55PM +0200, Paul Bolle wrote:
>> (Sent to the addresses get_maintainer.pl suggested and to Davide and
>> Nelson, because this is about code they cared about half a year ago.
>> CC'ed to the addresses involved until now.)
>>
>> On Thu, 2011-07-21 at 13:55 +0200, Paul Bolle wrote:
>> > That number turned out to be 722472
>> > ( https://bugzilla.redhat.com/show_bug.cgi?id=722472 ).
>>
>> 0) This seems to be a lockdep false alarm. The cause is an epoll
>> instance added to another epoll instance (ie, nesting epoll instances).
>> Apparently lockdep isn't supplied enough information to determine what's
>> going on here. Now there might be a number of ways to fix this. But
>> after having looked at this for quite some time and updating the above
>> bug report a number of times, I guessed that involving people outside
>> those tracking that report might move things forward towards a solution.
>> At least, I wasn't able to find a, well, clean solution.
>>
>> 1) The call chain triggering the warning with the nice
>>     *** DEADLOCK ***
>>
>> line can be summarized like this:
>>
>> sys_epoll_ctl
>>     mutex_lock                          epmutex
>>     ep_call_nested
>>         ep_loop_check_proc
>>             mutex_lock                      ep->mtx
>>             mutex_unlock                    ep->mtx
>>     mutex_lock                              ep->mtx
>>     ep_eventpoll_poll
>>         ep_ptable_queue_proc
>>         ep_call_nested
>>             ep_poll_readyevents_pro
>>                 ep_scan_ready_list
>>                     mutex_lock                  ep->mtx
>>                     ep_read_events_proc
>>                     mutex_unlock                ep->mtx
>>     mutex_unlock                            ep->mtx
>>     mutex_unlock                        epmutex
>>
>> 2) When ep_scan_ready_list() calls mutex_lock(), lockdep notices
>> recursive locking on ep->mtx. It is not supplied enough information to
>> determine that the lock is related to two separate epoll instances (the
>> outer instance and the nested instance). The solution appears to involve
>> supplying lockdep that information (ie, "lockdep annotation").
>>
>> 3) Please see the bugzilla.redhat.com report for further background.
>>
>>
>> Paul Bolle
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-07-30 21:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4E1FF63F.4040704@gmail.com>
2011-07-15 21:04 ` recursive locking: epoll Dave Jones
2011-07-20  8:05   ` Paul Bolle
2011-07-21 11:55   ` Paul Bolle
2011-07-29 18:50     ` Paul Bolle
2011-07-30 18:26       ` Nelson Elhage
2011-07-30 21:25         ` Nelson Elhage

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox