From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [GIT PULL rcu/urgent] Fix two more 4.3 regressions
Date: Sun, 27 Sep 2015 09:43:54 -0700 [thread overview]
Message-ID: <20150927164354.GI30373@linux.vnet.ibm.com> (raw)
In-Reply-To: <CA+icZUX91ZYs0hUD+YenSxg-roM_e5_HZ8pOU8Je7N5jSM8ZBg@mail.gmail.com>
On Sun, Sep 27, 2015 at 06:25:37PM +0200, Sedat Dilek wrote:
> On Sun, Sep 27, 2015 at 6:16 PM, Sedat Dilek <sedat.dilek@gmail.com> wrote:
> > On Sun, Sep 27, 2015 at 6:02 PM, Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >> On Sun, Sep 27, 2015 at 5:58 PM, Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >>> On Sun, Sep 27, 2015 at 5:55 PM, Sedat Dilek <sedat.dilek@gmail.com> wrote:
> >>>> On Sun, Sep 27, 2015 at 5:49 PM, Paul E. McKenney
> >>>> <paulmck@linux.vnet.ibm.com> wrote:
> >>>>> On Sun, Sep 27, 2015 at 09:37:05AM +0200, Sedat Dilek wrote:
> >>>>>> On Sun, Sep 27, 2015 at 9:32 AM, Paul E. McKenney
> >>>>>> <paulmck@linux.vnet.ibm.com> wrote:
> >>>>>> > On Sun, Sep 27, 2015 at 08:28:39AM +0200, Sedat Dilek wrote:
> >>>>>> >> Hi,
> >>>>>> >>
> >>>>>> >> as I have observed here some lockdep issues (one could be solved in
> >>>>>> >> netdev) I wanted to try this patchset.
> >>>>>> >>
> >>>>>> >> Unfortunately, you cannot pull from...
> >>>>>> >>
> >>>>>> >> "These changes are available in the git repository at:
> >>>>>> >>
> >>>>>> >> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git for-mingo
> >>>>>> >>
> >>>>>> >> for you to fetch changes up to 19a5ecde086a6a5287978b12ae948fa691b197b7:
> >>>>>> >>
> >>>>>> >> rcu: Suppress lockdep false positive for rcp->exp_funnel_mutex
> >>>>>> >> (2015-09-20 21:01:22 -0700)
> >>>>>> >>
> >>>>>> >> ----------------------------------------------------------------
> >>>>>> >> Oleg Nesterov (1):
> >>>>>> >> rcu: Change _wait_rcu_gp() to work around GCC bug 67055
> >>>>>> >>
> >>>>>> >> Paul E. McKenney (1):
> >>>>>> >> rcu: Suppress lockdep false positive for rcp->exp_funnel_mutex
> >>>>>> >>
> >>>>>> >> include/linux/rcupdate.h | 11 +++++------
> >>>>>> >> kernel/rcu/tree.c | 5 +++++
> >>>>>> >> 2 files changed, 10 insertions(+), 6 deletions(-)"
> >>>>>> >>
> >>>>>> >> So, I have stolen them from linux-next.git.
> >>>>>> >>
> >>>>>> >> Please look at this, Thanks.
> >>>>>> >
> >>>>>> > Does it work better now? (Forgot to actually push the new name...)
> >>>>>>
> >>>>>> Hi Paul,
> >>>>>>
> >>>>>> now for-mingo Git branch has this two fixes.
> >>>>>
> >>>>> Whew!!! Apologies for the hassle!
> >>>>>
> >>>>>> I just booted into my new kernel with the "stolen" rcu.fixes from -next.
> >>>>>>
> >>>>>> For the lockdep problems I will do a CONFIG_DEBUG_LOCKDEP=y to see if
> >>>>>> I get some more infos on the workqueue trouble.
> >>>>>>
> >>>>>> [ 23.874836] BUG: sleeping function called from invalid context at
> >>>>>> kernel/workqueue.c:2678
> >>>>>> [ 23.874902] in_atomic(): 0, irqs_disabled(): 1, pid: 1411, name: acpid
> >>>>>
> >>>>> Did you get a stack trace? There are quite a few potential callers of
> >>>>> start_flush_work() via flush_work().
> >>>>>
> >>>>
> >>>> Hi Paul :-),
> >>>>
> >>>> Here is the stack trace.
> >>>>
> >>>> [ 23.045871] BUG: sleeping function called from invalid context at
> >>>> kernel/workqueue.c:2678
> >>>> [ 23.045982] in_atomic(): 0, irqs_disabled(): 1, pid: 1399, name: acpid
> >>>> [ 23.046064] 3 locks held by acpid/1399:
> >>>> [ 23.046066] #0: (&evdev->mutex){+.+...}, at: [<ffffffff8174ac7c>]
> >>>> evdev_release+0xbc/0xf0
> >>>> [ 23.046081] #1: (&dev->mutex#2){+.+...}, at: [<ffffffff81742397>]
> >>>> input_close_device+0x27/0x70
> >>>> [ 23.046093] #2: (hid_open_mut){+.+...}, at: [<ffffffffa0056388>]
> >>>> usbhid_close+0x28/0xb0 [usbhid]
> >>>> [ 23.046106] irq event stamp: 3306
> >>>> [ 23.046109] hardirqs last enabled at (3305): [<ffffffff8192ae32>]
> >>>> _raw_spin_unlock_irq+0x32/0x60
> >>>> [ 23.046115] hardirqs last disabled at (3306): [<ffffffff81121017>]
> >>>> del_timer_sync+0x37/0x110
> >>>> [ 23.046122] softirqs last enabled at (2704): [<ffffffff818b12c9>]
> >>>> local_bh_enable+0x9/0x20
> >>>> [ 23.046128] softirqs last disabled at (2702): [<ffffffff818b12a9>]
> >>>> local_bh_disable+0x9/0x20
> >>>> [ 23.046136] CPU: 2 PID: 1399 Comm: acpid Not tainted
> >>>> 4.3.0-rc3-3-llvmlinux-amd64 #1
> >>>> [ 23.046139] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> >>>> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> >>>> [ 23.046143] ffff8800d36ee948 0000000000000092 0000000000000000
> >>>> ffff8800bbacfae8
> >>>> [ 23.046151] ffffffff8149adad ffff8800bbacfb18 ffffffff810cd1ea
> >>>> ffffffff81c56f0a
> >>>> [ 23.046158] ffff8800c22dc400 0000000000000000 0000000000000a76
> >>>> ffff8800bbacfb58
> >>>> [ 23.046165] Call Trace:
> >>>> [ 23.046172] [<ffffffff8149adad>] dump_stack+0x7d/0xa0
> >>>> [ 23.046177] [<ffffffff810cd1ea>] ___might_sleep+0x28a/0x2a0
> >>>> [ 23.046182] [<ffffffff810cceef>] __might_sleep+0x4f/0xc0
> >>>> [ 23.046187] [<ffffffff810afbff>] start_flush_work+0x2f/0x290
> >>>> [ 23.046192] [<ffffffff810afbac>] flush_work+0x5c/0x80
> >>>> [ 23.046195] [<ffffffff810afb6a>] ? flush_work+0x1a/0x80
> >>>> [ 23.046202] [<ffffffff810eed0d>] ? trace_hardirqs_off+0xd/0x10
> >>>> [ 23.046206] [<ffffffff810aecc8>] ? try_to_grab_pending+0x48/0x360
> >>>> [ 23.046211] [<ffffffff8192ac53>] ? _raw_spin_lock_irqsave+0x73/0x80
> >>>> [ 23.046216] [<ffffffff810afff9>] __cancel_work_timer+0x179/0x260
> >>>> [ 23.046221] [<ffffffff8192add2>] ? _raw_spin_unlock_irqrestore+0x52/0x80
> >>>> [ 23.046226] [<ffffffff81120fcd>] ? try_to_del_timer_sync+0xad/0xc0
> >>>> [ 23.046230] [<ffffffff810afe78>] cancel_work_sync+0x18/0x20
> >>>> [ 23.046237] [<ffffffffa00563d5>] usbhid_close+0x75/0xb0 [usbhid]
> >>>> [ 23.046245] [<ffffffffa00394d1>] hidinput_close+0x31/0x40 [hid]
> >>>> [ 23.046251] [<ffffffffa00394a0>] ? hidinput_open+0x40/0x40 [hid]
> >>>> [ 23.046256] [<ffffffff817423b8>] input_close_device+0x48/0x70
> >>>> [ 23.046261] [<ffffffff8174ac96>] evdev_release+0xd6/0xf0
> >>>> [ 23.046267] [<ffffffff812728c7>] __fput+0x107/0x240
> >>>> [ 23.046271] [<ffffffff81272756>] ____fput+0x16/0x20
> >>>> [ 23.046276] [<ffffffff810b945c>] task_work_run+0x6c/0xe0
> >>>> [ 23.046282] [<ffffffff81003aa7>] prepare_exit_to_usermode+0x117/0x120
> >>>> [ 23.046287] [<ffffffff81003ce1>] syscall_return_slowpath+0x231/0x2a0
> >>>> [ 23.046292] [<ffffffff8126efa5>] ? filp_close+0x65/0x90
> >>>> [ 23.046298] [<ffffffff810ef1c9>] ? trace_hardirqs_on_caller+0x19/0x290
> >>>> [ 23.046303] [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
> >>>> [ 23.046308] [<ffffffff8192bb62>] int_ret_from_sys_call+0x25/0x9f
> >>>>
> >>>> Can you give help on how to "debug" this?
> >>>>
> >>>> I switched from full-dynticks to simple cpu-accounting which did not help.
> >>>> But this was only a suspicion as Jiri pointed to the possibility
> >>>> del_timer_sync() could have get some mis-optimization.
> >>>>
> >>>> So, more empty head here.
> >>>>
> >>>
> >>> Forgot to attach dmesg-log and my kernel-config.
> >>> Sorry about that.
> >>>
> >>
> >> Time for a pause!
> >>
> >> Forgot to attach disassembled kernel/workqueue.o.
> >>
> >
> > When looking at start_flush_work() in kernel/workqueue.c...
> > ...I remembered the comments of Lai Jiangshan concerning the
> > might_sleep() check there.
> >
> > I tried to move the might_sleep() line to __cancel_work_timer() as
> > requested, but that did not help or narrowed down whatever.
> >
> > Please see [1] for more details.
> >
> > - Sedat -
> >
> > [1] http://marc.info/?l=linux-kernel&m=144184707824750&w=2
>
> Sorry, before you dig to deeply inside this.
>
> Jiri made a good analysis on the hid side looking at my stack trace... see [2].
Good. Might be that clang/LLVM needs some help here. ;-)
Thanx, Paul
> - Sedat -
>
> [2] http://marc.info/?l=linux-kernel&m=144308152407025&w=2
>
next prev parent reply other threads:[~2015-09-27 16:44 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-27 6:28 [GIT PULL rcu/urgent] Fix two more 4.3 regressions Sedat Dilek
2015-09-27 7:32 ` Paul E. McKenney
2015-09-27 7:37 ` Sedat Dilek
2015-09-27 15:49 ` Paul E. McKenney
2015-09-27 15:55 ` Sedat Dilek
2015-09-27 15:58 ` Sedat Dilek
2015-09-27 16:02 ` Sedat Dilek
2015-09-27 16:16 ` Sedat Dilek
2015-09-27 16:25 ` Sedat Dilek
2015-09-27 16:43 ` Paul E. McKenney [this message]
2015-09-27 16:42 ` Paul E. McKenney
2015-09-27 17:24 ` Sedat Dilek
2015-09-27 17:41 ` Paul E. McKenney
2015-09-28 6:03 ` Ingo Molnar
2015-09-28 6:24 ` Sedat Dilek
-- strict thread matches above, loose matches on Subject: below --
2015-09-27 4:19 Paul E. McKenney
2015-09-28 6:05 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150927164354.GI30373@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=sedat.dilek@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.