Re: possible deadlock in flush

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: possible deadlock in flush_work (2)
       [not found] ` <001a113ee9baf95598055d384ecb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2017-11-05  8:53   ` Dmitry Vyukov
  2017-11-05 16:00     ` Trond Myklebust
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Vyukov @ 2017-11-05  8:53 UTC (permalink / raw)
  To: syzbot
  Cc: Lai Jiangshan, LKML, syzkaller-bugs-/JYPxA39Uh5TLH3MbocFFw,
	Tejun Heo, Bruce Fields, Jeff Layton,
	trond.myklebust-7I+n7zu2hftEKMMhf/gKZA,
	anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA, David Miller,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev

On Sun, Nov 5, 2017 at 11:41 AM, syzbot
<bot+d8fe95298ef830cd7d05e33eefa4a5a6f6f334d4-Pl5Pbv+GP7P466ipTTIvnc23WoclnBCfAL8bYrjMMd8@public.gmane.org>
wrote:
> Hello,
>
> syzkaller hit the following crash on
> 0f611fb6dcc0d6d91b4e1fec911321f434a3b858
> git://git.cmpxchg.org/linux-mmots.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
>
> xs_tcp_setup_socket: connect returned unhandled error -113
> xs_tcp_setup_socket: connect returned unhandled error -113
> xs_tcp_setup_socket: connect returned unhandled error -113
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.14.0-rc5-mm1+ #20 Not tainted
> ------------------------------------------------------
> kworker/0:3/3400 is trying to acquire lock:
>  ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] start_flush_work
> kernel/workqueue.c:2850 [inline]
>  ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] flush_work+0x55a/0x8a0
> kernel/workqueue.c:2882
>
> but task is already holding lock:
>  ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
> process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 ((&task->u.tk_work)){+.+.}:
>        process_one_work+0xba2/0x1bc0 kernel/workqueue.c:2088
>        worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>        kthread+0x38b/0x470 kernel/kthread.c:242
>        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>
> -> #0 ("xprtiod"){+.+.}:
>        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
>        start_flush_work kernel/workqueue.c:2851 [inline]
>        flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
>        __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
>        cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
>        xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
>        xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
>        kref_put include/linux/kref.h:70 [inline]
>        xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
>        rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
>        rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
>        rpc_release_task net/sunrpc/sched.c:1059 [inline]
>        __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
>        rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
>        process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
>        worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>        kthread+0x38b/0x470 kernel/kthread.c:242
>        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>
> other info that might help us debug this:
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock((&task->u.tk_work));
>                                lock("xprtiod");
>                                lock((&task->u.tk_work));
>   lock("xprtiod");
>
>  *** DEADLOCK ***
>
> 2 locks held by kworker/0:3/3400:
>  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] __write_once_size
> include/linux/compiler.h:305 [inline]
>  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic64_set
> arch/x86/include/asm/atomic64_64.h:33 [inline]
>  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic_long_set
> include/asm-generic/atomic-long.h:56 [inline]
>  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] set_work_data
> kernel/workqueue.c:618 [inline]
>  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
> set_work_pool_and_clear_pending kernel/workqueue.c:645 [inline]
>  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
> process_one_work+0xadf/0x1bc0 kernel/workqueue.c:2083
>  #1:  ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
> process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
>
> stack backtrace:
> CPU: 0 PID: 3400 Comm: kworker/0:3 Not tainted 4.14.0-rc5-mm1+ #20
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: rpciod rpc_async_schedule
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>  print_circular_bug.isra.41+0x342/0x36a kernel/locking/lockdep.c:1258
>  check_prev_add kernel/locking/lockdep.c:1901 [inline]
>  check_prevs_add kernel/locking/lockdep.c:2018 [inline]
>  validate_chain kernel/locking/lockdep.c:2460 [inline]
>  __lock_acquire+0x2f55/0x3d50 kernel/locking/lockdep.c:3487
>  lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
>  start_flush_work kernel/workqueue.c:2851 [inline]
>  flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
>  __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
>  cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
>  xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
>  xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
>  kref_put include/linux/kref.h:70 [inline]
>  xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
>  rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
>  rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
>  rpc_release_task net/sunrpc/sched.c:1059 [inline]
>  __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
>  rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
>  process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>  kthread+0x38b/0x470 kernel/kthread.c:242
>  ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431


+sunrpc maintainers


> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> Please credit me with: Reported-by: syzbot <syzkaller-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
>
> syzbot will keep track of this bug report.
> Once a fix for this bug is committed, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/001a113ee9baf95598055d384ecb%40google.com.
> For more options, visit https://groups.google.com/d/optout.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: possible deadlock in flush_work (2)
  2017-11-05  8:53   ` possible deadlock in flush_work (2) Dmitry Vyukov
@ 2017-11-05 16:00     ` Trond Myklebust
  2017-11-06 10:34       ` Dmitry Vyukov
  0 siblings, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2017-11-05 16:00 UTC (permalink / raw)
  To: dvyukov@google.com,
	bot+d8fe95298ef830cd7d05e33eefa4a5a6f6f334d4@syzkaller.appspotmail.com
  Cc: linux-kernel@vger.kernel.org, bfields@fieldses.org,
	linux-nfs@vger.kernel.org, jlayton@poochiereds.net,
	jiangshanlai@gmail.com, anna.schumaker@netapp.com,
	netdev@vger.kernel.org, syzkaller-bugs@googlegroups.com,
	tj@kernel.org, davem@davemloft.net



On Sun, 2017-11-05 at 11:53 +0300, Dmitry Vyukov wrote:
> On Sun, Nov 5, 2017 at 11:41 AM, syzbot
> <bot+d8fe95298ef830cd7d05e33eefa4a5a6f6f334d4@syzkaller.appspotmail.c
> om>
> wrote:
> > Hello,
> > 
> > syzkaller hit the following crash on
> > 0f611fb6dcc0d6d91b4e1fec911321f434a3b858
> > git://git.cmpxchg.org/linux-mmots.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> > 
> > xs_tcp_setup_socket: connect returned unhandled error -113
> > xs_tcp_setup_socket: connect returned unhandled error -113
> > xs_tcp_setup_socket: connect returned unhandled error -113
> > 
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.14.0-rc5-mm1+ #20 Not tainted
> > ------------------------------------------------------
> > kworker/0:3/3400 is trying to acquire lock:
> >  ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] start_flush_work
> > kernel/workqueue.c:2850 [inline]
> >  ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] flush_work+0x55a/0x8a0
> > kernel/workqueue.c:2882
> > 
> > but task is already holding lock:
> >  ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
> > process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
> > 
> > which lock already depends on the new lock.
> > 
> > 
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #1 ((&task->u.tk_work)){+.+.}:
> >        process_one_work+0xba2/0x1bc0 kernel/workqueue.c:2088
> >        worker_thread+0x223/0x1990 kernel/workqueue.c:2246
> >        kthread+0x38b/0x470 kernel/kthread.c:242
> >        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
> > 
> > -> #0 ("xprtiod"){+.+.}:
> >        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
> >        start_flush_work kernel/workqueue.c:2851 [inline]
> >        flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
> >        __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
> >        cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
> >        xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
> >        xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
> >        kref_put include/linux/kref.h:70 [inline]
> >        xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
> >        rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
> >        rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
> >        rpc_release_task net/sunrpc/sched.c:1059 [inline]
> >        __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
> >        rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
> >        process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
> >        worker_thread+0x223/0x1990 kernel/workqueue.c:2246
> >        kthread+0x38b/0x470 kernel/kthread.c:242
> >        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
> > 
> > other info that might help us debug this:
> > 
> >  Possible unsafe locking scenario:
> > 
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock((&task->u.tk_work));
> >                                lock("xprtiod");
> >                                lock((&task->u.tk_work));
> >   lock("xprtiod");
> > 
> >  *** DEADLOCK ***
> > 
> > 2 locks held by kworker/0:3/3400:
> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] __write_once_size
> > include/linux/compiler.h:305 [inline]
> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic64_set
> > arch/x86/include/asm/atomic64_64.h:33 [inline]
> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic_long_set
> > include/asm-generic/atomic-long.h:56 [inline]
> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] set_work_data
> > kernel/workqueue.c:618 [inline]
> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
> > set_work_pool_and_clear_pending kernel/workqueue.c:645 [inline]
> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
> > process_one_work+0xadf/0x1bc0 kernel/workqueue.c:2083
> >  #1:  ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
> > process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
> > 
> > stack backtrace:
> > CPU: 0 PID: 3400 Comm: kworker/0:3 Not tainted 4.14.0-rc5-mm1+ #20
> > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > BIOS
> > Google 01/01/2011
> > Workqueue: rpciod rpc_async_schedule
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:16 [inline]
> >  dump_stack+0x194/0x257 lib/dump_stack.c:52
> >  print_circular_bug.isra.41+0x342/0x36a
> > kernel/locking/lockdep.c:1258
> >  check_prev_add kernel/locking/lockdep.c:1901 [inline]
> >  check_prevs_add kernel/locking/lockdep.c:2018 [inline]
> >  validate_chain kernel/locking/lockdep.c:2460 [inline]
> >  __lock_acquire+0x2f55/0x3d50 kernel/locking/lockdep.c:3487
> >  lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
> >  start_flush_work kernel/workqueue.c:2851 [inline]
> >  flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
> >  __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
> >  cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
> >  xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
> >  xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
> >  kref_put include/linux/kref.h:70 [inline]
> >  xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
> >  rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
> >  rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
> >  rpc_release_task net/sunrpc/sched.c:1059 [inline]
> >  __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
> >  rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
> >  process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
> >  worker_thread+0x223/0x1990 kernel/workqueue.c:2246
> >  kthread+0x38b/0x470 kernel/kthread.c:242
> >  ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
> 
> 
> +sunrpc maintainers

A fix for this has already been merged. Please retest with an up to
date kernel.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: possible deadlock in flush_work (2)
  2017-11-05 16:00     ` Trond Myklebust
@ 2017-11-06 10:34       ` Dmitry Vyukov
  2018-02-13 18:55         ` Dmitry Vyukov
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Vyukov @ 2017-11-06 10:34 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: bot+d8fe95298ef830cd7d05e33eefa4a5a6f6f334d4@syzkaller.appspotmail.com,
	linux-kernel@vger.kernel.org, bfields@fieldses.org,
	linux-nfs@vger.kernel.org, jlayton@poochiereds.net,
	jiangshanlai@gmail.com, anna.schumaker@netapp.com,
	netdev@vger.kernel.org, syzkaller-bugs@googlegroups.com,
	tj@kernel.org, davem@davemloft.net

On Sun, Nov 5, 2017 at 5:00 PM, Trond Myklebust <trondmy@primarydata.com> wrote:
>
>
> On Sun, 2017-11-05 at 11:53 +0300, Dmitry Vyukov wrote:
>> On Sun, Nov 5, 2017 at 11:41 AM, syzbot
>> <bot+d8fe95298ef830cd7d05e33eefa4a5a6f6f334d4@syzkaller.appspotmail.c
>> om>
>> wrote:
>> > Hello,
>> >
>> > syzkaller hit the following crash on
>> > 0f611fb6dcc0d6d91b4e1fec911321f434a3b858
>> > git://git.cmpxchg.org/linux-mmots.git/master
>> > compiler: gcc (GCC) 7.1.1 20170620
>> > .config is attached
>> > Raw console output is attached.
>> >
>> > xs_tcp_setup_socket: connect returned unhandled error -113
>> > xs_tcp_setup_socket: connect returned unhandled error -113
>> > xs_tcp_setup_socket: connect returned unhandled error -113
>> >
>> > ======================================================
>> > WARNING: possible circular locking dependency detected
>> > 4.14.0-rc5-mm1+ #20 Not tainted
>> > ------------------------------------------------------
>> > kworker/0:3/3400 is trying to acquire lock:
>> >  ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] start_flush_work
>> > kernel/workqueue.c:2850 [inline]
>> >  ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] flush_work+0x55a/0x8a0
>> > kernel/workqueue.c:2882
>> >
>> > but task is already holding lock:
>> >  ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
>> > process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
>> >
>> > which lock already depends on the new lock.
>> >
>> >
>> > the existing dependency chain (in reverse order) is:
>> >
>> > -> #1 ((&task->u.tk_work)){+.+.}:
>> >        process_one_work+0xba2/0x1bc0 kernel/workqueue.c:2088
>> >        worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>> >        kthread+0x38b/0x470 kernel/kthread.c:242
>> >        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>> >
>> > -> #0 ("xprtiod"){+.+.}:
>> >        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
>> >        start_flush_work kernel/workqueue.c:2851 [inline]
>> >        flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
>> >        __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
>> >        cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
>> >        xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
>> >        xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
>> >        kref_put include/linux/kref.h:70 [inline]
>> >        xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
>> >        rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
>> >        rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
>> >        rpc_release_task net/sunrpc/sched.c:1059 [inline]
>> >        __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
>> >        rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
>> >        process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
>> >        worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>> >        kthread+0x38b/0x470 kernel/kthread.c:242
>> >        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>> >
>> > other info that might help us debug this:
>> >
>> >  Possible unsafe locking scenario:
>> >
>> >        CPU0                    CPU1
>> >        ----                    ----
>> >   lock((&task->u.tk_work));
>> >                                lock("xprtiod");
>> >                                lock((&task->u.tk_work));
>> >   lock("xprtiod");
>> >
>> >  *** DEADLOCK ***
>> >
>> > 2 locks held by kworker/0:3/3400:
>> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] __write_once_size
>> > include/linux/compiler.h:305 [inline]
>> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic64_set
>> > arch/x86/include/asm/atomic64_64.h:33 [inline]
>> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic_long_set
>> > include/asm-generic/atomic-long.h:56 [inline]
>> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] set_work_data
>> > kernel/workqueue.c:618 [inline]
>> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
>> > set_work_pool_and_clear_pending kernel/workqueue.c:645 [inline]
>> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
>> > process_one_work+0xadf/0x1bc0 kernel/workqueue.c:2083
>> >  #1:  ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
>> > process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
>> >
>> > stack backtrace:
>> > CPU: 0 PID: 3400 Comm: kworker/0:3 Not tainted 4.14.0-rc5-mm1+ #20
>> > Hardware name: Google Google Compute Engine/Google Compute Engine,
>> > BIOS
>> > Google 01/01/2011
>> > Workqueue: rpciod rpc_async_schedule
>> > Call Trace:
>> >  __dump_stack lib/dump_stack.c:16 [inline]
>> >  dump_stack+0x194/0x257 lib/dump_stack.c:52
>> >  print_circular_bug.isra.41+0x342/0x36a
>> > kernel/locking/lockdep.c:1258
>> >  check_prev_add kernel/locking/lockdep.c:1901 [inline]
>> >  check_prevs_add kernel/locking/lockdep.c:2018 [inline]
>> >  validate_chain kernel/locking/lockdep.c:2460 [inline]
>> >  __lock_acquire+0x2f55/0x3d50 kernel/locking/lockdep.c:3487
>> >  lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
>> >  start_flush_work kernel/workqueue.c:2851 [inline]
>> >  flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
>> >  __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
>> >  cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
>> >  xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
>> >  xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
>> >  kref_put include/linux/kref.h:70 [inline]
>> >  xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
>> >  rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
>> >  rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
>> >  rpc_release_task net/sunrpc/sched.c:1059 [inline]
>> >  __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
>> >  rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
>> >  process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
>> >  worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>> >  kthread+0x38b/0x470 kernel/kthread.c:242
>> >  ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>>
>>
>> +sunrpc maintainers
>
> A fix for this has already been merged. Please retest with an up to
> date kernel.

Hi,

What's the fix? Please specify it in the following form:

> syzbot will keep track of this bug report.
> Once a fix for this bug is committed, please reply to this email with:
> #syz fix: exact-commit-title
> Note: all commands must start from beginning of the line.

The bot tests HEAD of multiple branches and it needs to know what
fixes what, because this still happens on other branches. Once the bot
knows the fix, it will track when the fix reaches all tested branches.

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: possible deadlock in flush_work (2)
  2017-11-06 10:34       ` Dmitry Vyukov
@ 2018-02-13 18:55         ` Dmitry Vyukov
  0 siblings, 0 replies; 4+ messages in thread
From: Dmitry Vyukov @ 2018-02-13 18:55 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: bot+d8fe95298ef830cd7d05e33eefa4a5a6f6f334d4@syzkaller.appspotmail.com,
	linux-kernel@vger.kernel.org, bfields@fieldses.org,
	linux-nfs@vger.kernel.org, jlayton@poochiereds.net,
	jiangshanlai@gmail.com, anna.schumaker@netapp.com,
	netdev@vger.kernel.org, syzkaller-bugs@googlegroups.com,
	tj@kernel.org, davem@davemloft.net

On Mon, Nov 6, 2017 at 11:34 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Sun, Nov 5, 2017 at 5:00 PM, Trond Myklebust <trondmy@primarydata.com> wrote:
>>
>>
>> On Sun, 2017-11-05 at 11:53 +0300, Dmitry Vyukov wrote:
>>> On Sun, Nov 5, 2017 at 11:41 AM, syzbot
>>> <bot+d8fe95298ef830cd7d05e33eefa4a5a6f6f334d4@syzkaller.appspotmail.c
>>> om>
>>> wrote:
>>> > Hello,
>>> >
>>> > syzkaller hit the following crash on
>>> > 0f611fb6dcc0d6d91b4e1fec911321f434a3b858
>>> > git://git.cmpxchg.org/linux-mmots.git/master
>>> > compiler: gcc (GCC) 7.1.1 20170620
>>> > .config is attached
>>> > Raw console output is attached.
>>> >
>>> > xs_tcp_setup_socket: connect returned unhandled error -113
>>> > xs_tcp_setup_socket: connect returned unhandled error -113
>>> > xs_tcp_setup_socket: connect returned unhandled error -113
>>> >
>>> > ======================================================
>>> > WARNING: possible circular locking dependency detected
>>> > 4.14.0-rc5-mm1+ #20 Not tainted
>>> > ------------------------------------------------------
>>> > kworker/0:3/3400 is trying to acquire lock:
>>> >  ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] start_flush_work
>>> > kernel/workqueue.c:2850 [inline]
>>> >  ("xprtiod"){+.+.}, at: [<ffffffff8146adda>] flush_work+0x55a/0x8a0
>>> > kernel/workqueue.c:2882
>>> >
>>> > but task is already holding lock:
>>> >  ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
>>> > process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
>>> >
>>> > which lock already depends on the new lock.
>>> >
>>> >
>>> > the existing dependency chain (in reverse order) is:
>>> >
>>> > -> #1 ((&task->u.tk_work)){+.+.}:
>>> >        process_one_work+0xba2/0x1bc0 kernel/workqueue.c:2088
>>> >        worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>>> >        kthread+0x38b/0x470 kernel/kthread.c:242
>>> >        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>>> >
>>> > -> #0 ("xprtiod"){+.+.}:
>>> >        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
>>> >        start_flush_work kernel/workqueue.c:2851 [inline]
>>> >        flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
>>> >        __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
>>> >        cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
>>> >        xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
>>> >        xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
>>> >        kref_put include/linux/kref.h:70 [inline]
>>> >        xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
>>> >        rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
>>> >        rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
>>> >        rpc_release_task net/sunrpc/sched.c:1059 [inline]
>>> >        __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
>>> >        rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
>>> >        process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
>>> >        worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>>> >        kthread+0x38b/0x470 kernel/kthread.c:242
>>> >        ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>>> >
>>> > other info that might help us debug this:
>>> >
>>> >  Possible unsafe locking scenario:
>>> >
>>> >        CPU0                    CPU1
>>> >        ----                    ----
>>> >   lock((&task->u.tk_work));
>>> >                                lock("xprtiod");
>>> >                                lock((&task->u.tk_work));
>>> >   lock("xprtiod");
>>> >
>>> >  *** DEADLOCK ***
>>> >
>>> > 2 locks held by kworker/0:3/3400:
>>> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] __write_once_size
>>> > include/linux/compiler.h:305 [inline]
>>> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic64_set
>>> > arch/x86/include/asm/atomic64_64.h:33 [inline]
>>> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] atomic_long_set
>>> > include/asm-generic/atomic-long.h:56 [inline]
>>> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>] set_work_data
>>> > kernel/workqueue.c:618 [inline]
>>> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
>>> > set_work_pool_and_clear_pending kernel/workqueue.c:645 [inline]
>>> >  #0:  ("rpciod"){+.+.}, at: [<ffffffff81471e5f>]
>>> > process_one_work+0xadf/0x1bc0 kernel/workqueue.c:2083
>>> >  #1:  ((&task->u.tk_work)){+.+.}, at: [<ffffffff81471eb2>]
>>> > process_one_work+0xb32/0x1bc0 kernel/workqueue.c:2087
>>> >
>>> > stack backtrace:
>>> > CPU: 0 PID: 3400 Comm: kworker/0:3 Not tainted 4.14.0-rc5-mm1+ #20
>>> > Hardware name: Google Google Compute Engine/Google Compute Engine,
>>> > BIOS
>>> > Google 01/01/2011
>>> > Workqueue: rpciod rpc_async_schedule
>>> > Call Trace:
>>> >  __dump_stack lib/dump_stack.c:16 [inline]
>>> >  dump_stack+0x194/0x257 lib/dump_stack.c:52
>>> >  print_circular_bug.isra.41+0x342/0x36a
>>> > kernel/locking/lockdep.c:1258
>>> >  check_prev_add kernel/locking/lockdep.c:1901 [inline]
>>> >  check_prevs_add kernel/locking/lockdep.c:2018 [inline]
>>> >  validate_chain kernel/locking/lockdep.c:2460 [inline]
>>> >  __lock_acquire+0x2f55/0x3d50 kernel/locking/lockdep.c:3487
>>> >  lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991
>>> >  start_flush_work kernel/workqueue.c:2851 [inline]
>>> >  flush_work+0x57f/0x8a0 kernel/workqueue.c:2882
>>> >  __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954
>>> >  cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990
>>> >  xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467
>>> >  xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline]
>>> >  kref_put include/linux/kref.h:70 [inline]
>>> >  xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501
>>> >  rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986
>>> >  rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020
>>> >  rpc_release_task net/sunrpc/sched.c:1059 [inline]
>>> >  __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824
>>> >  rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848
>>> >  process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112
>>> >  worker_thread+0x223/0x1990 kernel/workqueue.c:2246
>>> >  kthread+0x38b/0x470 kernel/kthread.c:242
>>> >  ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
>>>
>>>
>>> +sunrpc maintainers
>>
>> A fix for this has already been merged. Please retest with an up to
>> date kernel.
>
> Hi,
>
> What's the fix? Please specify it in the following form:
>
>> syzbot will keep track of this bug report.
>> Once a fix for this bug is committed, please reply to this email with:
>> #syz fix: exact-commit-title
>> Note: all commands must start from beginning of the line.
>
> The bot tests HEAD of multiple branches and it needs to know what
> fixes what, because this still happens on other branches. Once the bot
> knows the fix, it will track when the fix reaches all tested branches.

Seems to be this one:

#syz fix: SUNRPC: Destroy transport from the system workqueue

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-02-13 18:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <001a113ee9baf95598055d384ecb@google.com>
     [not found] ` <001a113ee9baf95598055d384ecb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2017-11-05  8:53   ` possible deadlock in flush_work (2) Dmitry Vyukov
2017-11-05 16:00     ` Trond Myklebust
2017-11-06 10:34       ` Dmitry Vyukov
2018-02-13 18:55         ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).