From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [GIT PULL rcu/urgent] Fix two more 4.3 regressions
Date: Sun, 27 Sep 2015 10:41:39 -0700 [thread overview]
Message-ID: <20150927174139.GJ30373@linux.vnet.ibm.com> (raw)
In-Reply-To: <CA+icZUVooOB5ZbrnGJQ_bqa5BUnq6BYQNCLygVF+mqWShhv7RA@mail.gmail.com>
On Sun, Sep 27, 2015 at 07:24:22PM +0200, Sedat Dilek wrote:
> On Sun, Sep 27, 2015 at 6:42 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Sun, Sep 27, 2015 at 05:55:43PM +0200, Sedat Dilek wrote:
> >> On Sun, Sep 27, 2015 at 5:49 PM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Sun, Sep 27, 2015 at 09:37:05AM +0200, Sedat Dilek wrote:
> >> >> On Sun, Sep 27, 2015 at 9:32 AM, Paul E. McKenney
> >> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> >> > On Sun, Sep 27, 2015 at 08:28:39AM +0200, Sedat Dilek wrote:
> >
> > [ . . . ]
> >
> >> Hi Paul :-),
> >>
> >> Here is the stack trace.
> >>
> >> [ 23.045871] BUG: sleeping function called from invalid context at
> >> kernel/workqueue.c:2678
> >> [ 23.045982] in_atomic(): 0, irqs_disabled(): 1, pid: 1399, name: acpid
> >> [ 23.046064] 3 locks held by acpid/1399:
> >> [ 23.046066] #0: (&evdev->mutex){+.+...}, at: [<ffffffff8174ac7c>]
> >> evdev_release+0xbc/0xf0
> >> [ 23.046081] #1: (&dev->mutex#2){+.+...}, at: [<ffffffff81742397>]
> >> input_close_device+0x27/0x70
> >> [ 23.046093] #2: (hid_open_mut){+.+...}, at: [<ffffffffa0056388>]
> >> usbhid_close+0x28/0xb0 [usbhid]
> >> [ 23.046106] irq event stamp: 3306
> >> [ 23.046109] hardirqs last enabled at (3305): [<ffffffff8192ae32>]
> >> _raw_spin_unlock_irq+0x32/0x60
> >> [ 23.046115] hardirqs last disabled at (3306): [<ffffffff81121017>]
> >> del_timer_sync+0x37/0x110
> >> [ 23.046122] softirqs last enabled at (2704): [<ffffffff818b12c9>]
> >> local_bh_enable+0x9/0x20
> >> [ 23.046128] softirqs last disabled at (2702): [<ffffffff818b12a9>]
> >> local_bh_disable+0x9/0x20
> >> [ 23.046136] CPU: 2 PID: 1399 Comm: acpid Not tainted
> >> 4.3.0-rc3-3-llvmlinux-amd64 #1
> >> [ 23.046139] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> >> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> >> [ 23.046143] ffff8800d36ee948 0000000000000092 0000000000000000
> >> ffff8800bbacfae8
> >> [ 23.046151] ffffffff8149adad ffff8800bbacfb18 ffffffff810cd1ea
> >> ffffffff81c56f0a
> >> [ 23.046158] ffff8800c22dc400 0000000000000000 0000000000000a76
> >> ffff8800bbacfb58
> >> [ 23.046165] Call Trace:
> >> [ 23.046172] [<ffffffff8149adad>] dump_stack+0x7d/0xa0
> >> [ 23.046177] [<ffffffff810cd1ea>] ___might_sleep+0x28a/0x2a0
> >> [ 23.046182] [<ffffffff810cceef>] __might_sleep+0x4f/0xc0
> >> [ 23.046187] [<ffffffff810afbff>] start_flush_work+0x2f/0x290
> >> [ 23.046192] [<ffffffff810afbac>] flush_work+0x5c/0x80
> >> [ 23.046195] [<ffffffff810afb6a>] ? flush_work+0x1a/0x80
> >> [ 23.046202] [<ffffffff810eed0d>] ? trace_hardirqs_off+0xd/0x10
> >> [ 23.046206] [<ffffffff810aecc8>] ? try_to_grab_pending+0x48/0x360
> >> [ 23.046211] [<ffffffff8192ac53>] ? _raw_spin_lock_irqsave+0x73/0x80
> >> [ 23.046216] [<ffffffff810afff9>] __cancel_work_timer+0x179/0x260
> >> [ 23.046221] [<ffffffff8192add2>] ? _raw_spin_unlock_irqrestore+0x52/0x80
> >> [ 23.046226] [<ffffffff81120fcd>] ? try_to_del_timer_sync+0xad/0xc0
> >> [ 23.046230] [<ffffffff810afe78>] cancel_work_sync+0x18/0x20
> >> [ 23.046237] [<ffffffffa00563d5>] usbhid_close+0x75/0xb0 [usbhid]
> >> [ 23.046245] [<ffffffffa00394d1>] hidinput_close+0x31/0x40 [hid]
> >> [ 23.046251] [<ffffffffa00394a0>] ? hidinput_open+0x40/0x40 [hid]
> >> [ 23.046256] [<ffffffff817423b8>] input_close_device+0x48/0x70
> >> [ 23.046261] [<ffffffff8174ac96>] evdev_release+0xd6/0xf0
> >> [ 23.046267] [<ffffffff812728c7>] __fput+0x107/0x240
> >> [ 23.046271] [<ffffffff81272756>] ____fput+0x16/0x20
> >> [ 23.046276] [<ffffffff810b945c>] task_work_run+0x6c/0xe0
> >> [ 23.046282] [<ffffffff81003aa7>] prepare_exit_to_usermode+0x117/0x120
> >> [ 23.046287] [<ffffffff81003ce1>] syscall_return_slowpath+0x231/0x2a0
> >> [ 23.046292] [<ffffffff8126efa5>] ? filp_close+0x65/0x90
> >> [ 23.046298] [<ffffffff810ef1c9>] ? trace_hardirqs_on_caller+0x19/0x290
> >> [ 23.046303] [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
> >> [ 23.046308] [<ffffffff8192bb62>] int_ret_from_sys_call+0x25/0x9f
> >>
> >> Can you give help on how to "debug" this?
> >>
> >> I switched from full-dynticks to simple cpu-accounting which did not help.
> >> But this was only a suspicion as Jiri pointed to the possibility
> >> del_timer_sync() could have get some mis-optimization.
> >>
> >> So, more empty head here.
> >
> > I am not familiar with the "hardirqs last enabled" debug output, but
> > I am guessing that hardirqs are disabled because the "last disabled"
> > number is greater than the "last enabled" number. So if I understand
> > correctly, something somewhere up the call stack has irqs disabled,
> > but is calling a function that needs them enabled.
> >
> > A quick look at usbhid_close() shows no problem, but then again, I am
> > looking at the v4.2 version rather than whatever you are testing.
> > But v4.3-rc2 looks the same.
> >
> > Looks like you are already in contact with Jiri, which is good.
> >
> > Another thing to try would be to insert might_sleep() calls, after
> > usbhid_close(), since it enables irqs explicitly, but before the call
> > to flush_work(). Jiri's comment can be interpreted as a suggestion
> > to insert a might_sleep() in try_to_del_timer_sync(), though
> > __cancel_work_timer() wouldn't be a bad place to try as well,
> > just before its call to flush_work().
>
> With...
>
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1026,6 +1026,8 @@ int try_to_del_timer_sync(struct timer_list *timer)
> unsigned long flags;
> int ret = -1;
>
> + might_sleep();
> +
> debug_assert_init(timer);
>
> base = lock_timer_base(timer, &flags);
>
> ...I get now...
>
> [ 53.104411] BUG: sleeping function called from invalid context at
> kernel/time/timer.c:1029
> [ 53.104417] in_atomic(): 0, irqs_disabled(): 1, pid: 2100, name: fusermount
> [ 53.104419] 1 lock held by fusermount/2100:
> [ 53.104420] #0: (&type->s_umount_key#33){+.+...}, at:
> [<ffffffff812746ab>] deactivate_super+0x5b/0x70
> [ 53.104432] irq event stamp: 3734
> [ 53.104433] hardirqs last enabled at (3733): [<ffffffff810aefb8>]
> mod_delayed_work_on+0x78/0xa0
> [ 53.104438] hardirqs last disabled at (3734): [<ffffffff810b045f>]
> flush_delayed_work+0x1f/0x70
> [ 53.104441] softirqs last enabled at (3730): [<ffffffff81204997>]
> wb_shutdown+0x47/0xb0
> [ 53.104444] softirqs last disabled at (3728): [<ffffffff81204976>]
> wb_shutdown+0x26/0xb0
> [ 53.104449] CPU: 3 PID: 2100 Comm: fusermount Not tainted
> 4.3.0-rc3-4-llvmlinux-amd64 #2
> [ 53.104451] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [ 53.104453] 000000000000000a 0000000000000096 0000000000000000
> ffff8800af39bb28
> [ 53.104457] ffffffff8149cadd ffff8800af39bb58 ffffffff810cd5aa
> ffffffff81c5f2e8
> [ 53.104461] ffff8800c5785940 0000000000000000 0000000000000405
> ffff8800af39bb98
> [ 53.104465] Call Trace:
> [ 53.104470] [<ffffffff8149cadd>] dump_stack+0x7d/0xa0
> [ 53.104473] [<ffffffff810cd5aa>] ___might_sleep+0x28a/0x2a0
> [ 53.104475] [<ffffffff810cd2af>] __might_sleep+0x4f/0xc0
> [ 53.104480] [<ffffffff81122055>] try_to_del_timer_sync+0x25/0xe0
> [ 53.104483] [<ffffffff81122204>] del_timer_sync+0xf4/0x110
> [ 53.104486] [<ffffffff8112214b>] ? del_timer_sync+0x3b/0x110
> [ 53.104497] [<ffffffff810b0468>] flush_delayed_work+0x28/0x70
> [ 53.104499] [<ffffffff812049cb>] wb_shutdown+0x7b/0xb0
> [ 53.104501] [<ffffffff812046fd>] bdi_destroy+0x7d/0x2d0
> [ 53.104505] [<ffffffff810efe2d>] ? trace_hardirqs_on+0xd/0x10
> [ 53.104509] [<ffffffff813b7b08>] fuse_put_super+0xf8/0x140
> [ 53.104511] [<ffffffff813b7a10>] ? fuse_evict_inode+0x80/0x80
> [ 53.104513] [<ffffffff81274778>] generic_shutdown_super+0x68/0x120
> [ 53.104515] [<ffffffff81275a80>] kill_anon_super+0x20/0x70
> [ 53.104518] [<ffffffff813b6fcd>] fuse_kill_sb_anon+0x4d/0x60
> [ 53.104520] [<ffffffff812745dd>] deactivate_locked_super+0x4d/0xc0
> [ 53.104522] [<ffffffff812746b3>] deactivate_super+0x63/0x70
> [ 53.104525] [<ffffffff8129fa02>] cleanup_mnt+0xb2/0x140
> [ 53.104528] [<ffffffff8129f93a>] __cleanup_mnt+0x1a/0x30
> [ 53.104531] [<ffffffff810b97bc>] task_work_run+0x6c/0xe0
> [ 53.104534] [<ffffffff81003b8a>] prepare_exit_to_usermode+0x13a/0x140
> [ 53.104537] [<ffffffff81003e11>] syscall_return_slowpath+0x281/0x2f0
> [ 53.104540] [<ffffffff8129c121>] ? SyS_umount+0x341/0x620
> [ 53.104542] [<ffffffff810f02d9>] ? trace_hardirqs_on_caller+0x19/0x290
> [ 53.104545] [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
> [ 53.104550] [<ffffffff8192da22>] int_ret_from_sys_call+0x25/0x9f
>
> A problem in fuse?
Or there is a need to qualify might_sleep() by its caller. So perhaps
set a per-CPU variable in the caller and only call might_sleep() if that
per-CPU variable is set. Of course, have the caller clear it on return.
Thanx, Paul
next prev parent reply other threads:[~2015-09-27 17:41 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-27 6:28 [GIT PULL rcu/urgent] Fix two more 4.3 regressions Sedat Dilek
2015-09-27 7:32 ` Paul E. McKenney
2015-09-27 7:37 ` Sedat Dilek
2015-09-27 15:49 ` Paul E. McKenney
2015-09-27 15:55 ` Sedat Dilek
2015-09-27 15:58 ` Sedat Dilek
2015-09-27 16:02 ` Sedat Dilek
2015-09-27 16:16 ` Sedat Dilek
2015-09-27 16:25 ` Sedat Dilek
2015-09-27 16:43 ` Paul E. McKenney
2015-09-27 16:42 ` Paul E. McKenney
2015-09-27 17:24 ` Sedat Dilek
2015-09-27 17:41 ` Paul E. McKenney [this message]
2015-09-28 6:03 ` Ingo Molnar
2015-09-28 6:24 ` Sedat Dilek
-- strict thread matches above, loose matches on Subject: below --
2015-09-27 4:19 Paul E. McKenney
2015-09-28 6:05 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150927174139.GJ30373@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=sedat.dilek@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.