From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
To: lkp@lists.01.org
Subject: Re: [list] 8052ce2d97: WARNING: CPU: 0 PID: 1 at lib/list_debug.c:60 __list_del_entry+0x4d/0x71
Date: Thu, 08 Sep 2016 02:27:12 -0700 [thread overview]
Message-ID: <20160908092712.GI3663@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAGXu5jJFNymPhRp=1Zp829-_wmT2XwzR0Lej2hy_CExQrG1Y5g@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 11214 bytes --]
On Wed, Sep 07, 2016 at 10:25:58AM -0700, Kees Cook wrote:
> On Wed, Sep 7, 2016 at 12:31 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Tue, Sep 06, 2016 at 11:49:04AM -0400, Kees Cook wrote:
> >> On Sun, Sep 4, 2016 at 5:48 AM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Sun, Sep 04, 2016 at 08:35:40AM +0800, kernel test robot wrote:
> >> >>
> >> >> FYI, we noticed the following commit:
> >> >>
> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2016.08.19a
> >> >> commit 8052ce2d9771ab5a818307f3abbaf33bba82a631 ("list: Split list_add() debug checking into separate function")
> >> >>
> >> >> in testcase: boot
> >> >>
> >> >> on test machine: qemu-system-i386 -enable-kvm -m 360M
> >> >>
> >> >> caused below changes:
> >> >
> >> > Hello, Kees,
> >> >
> >> > Looks like your change might have actually found a bug. ;-)
> >> >
> >> > How would you like to handle this?
> >>
> >> We should fix the bug! :)
> >
> > But of course...
> >
> > The question is whether I am considered to be fully responsible for
> > chasing down any bugs that arise. It is looking like I am, in which
> > case I must defer this patch to the 4.10 merge window. Things are a
> > bit crazy now due to prior overcommitments...
>
> I'm going to double-check the code, but the changes I made shouldn't
> have changed any of the logical checks under CONFIG_DEBUG_LIST... just
> the manifestation of WARN vs BUG. So, since both this and the BPF RCU
> bug have appeared, something has changed... I will go digging...
So maybe the boot failures are a result of the WARN->BUG transition?
Thanx, Paul
> -Kees
>
> > Yes, yes, they will be crazy then as well, but this would be higher
> > on the priority list.
> >
> > Thanx, Paul
> >
> >> >> +------------------------------------------------------------+------------+------------+
> >> >> | | 0a33d912a3 | 8052ce2d97 |
> >> >> +------------------------------------------------------------+------------+------------+
> >> >> | boot_successes | 0 | 0 |
> >> >> | boot_failures | 211 | 18 |
> >> >> | WARNING:at_lib/list_debug.c:#__list_del_entry | 211 | 18 |
> >> >> | calltrace:init | 209 | 18 |
> >> >> | WARNING:at_lib/list_debug.c:#__list_add | 209 | |
> >> >> | calltrace:i8042_init | 209 | 17 |
> >> >> | calltrace:SyS_reboot | 192 | 4 |
> >> >> | EIP_is_at__mutex_unlock_slowpath | 29 | |
> >> >> | Kernel_panic-not_syncing:softlockup:hung_tasks | 138 | |
> >> >> | BUG:workqueue_lockup-pool_cpus=#flags=#nice=#stuck_for#s | 56 | |
> >> >> | EIP_is_at_lock_release | 7 | |
> >> >> | EIP_is_at_lock_acquire | 28 | |
> >> >> | EIP_is_at_raw_spin_unlock_irq | 30 | |
> >> >> | EIP_is_at_mutex_lock_nested | 21 | |
> >> >> | EIP_is_at_lock_is_held | 10 | |
> >> >> | EIP_is_at_device_shutdown | 3 | |
> >> >> | EIP_is_at___might_sleep | 5 | |
> >> >> | EIP_is_at__list_del_entry | 4 | 1 |
> >> >> | EIP_is_at_get_device | 1 | |
> >> >> | BUG:unable_to_handle_kernel | 7 | 5 |
> >> >> | Oops | 7 | 5 |
> >> >> | calltrace:SyS_ipc | 4 | 4 |
> >> >> | Kernel_panic-not_syncing:Fatal_exception | 7 | 5 |
> >> >> | EIP_is_at_pm_runtime_barrier | 1 | |
> >> >> | EIP_is_at_lock_acquired | 1 | |
> >> >> | EIP_is_at__might_sleep | 1 | |
> >> >> | BUG:workqueue_lockup-pool_cpus=#-#flags=#nice=#stuck_for#s | 1 | |
> >> >> | WARNING:at_arch/x86/mm/dump_pagetables.c:#note_page | 26 | |
> >> >> | calltrace:mark_rodata_ro | 26 | |
> >> >> | WARNING:at_lib/list_debug.c:#__list_add_valid | 0 | 17 |
> >> >> | calltrace:serio_handle_event | 0 | 17 |
> >> >> | calltrace:userio_misc_init | 0 | 17 |
> >> >> | calltrace:mousedev_init | 0 | 17 |
> >> >> | calltrace:evdev_init | 0 | 17 |
> >> >> | calltrace:cmos_init | 0 | 17 |
> >> >> | calltrace:test_init | 0 | 17 |
> >> >> | calltrace:pca_isa_driver_init | 0 | 17 |
> >> >> | calltrace:vim2m_init | 0 | 17 |
> >> >> | calltrace:fm_drv_init | 0 | 17 |
> >> >> | calltrace:test_power_init | 0 | 17 |
> >> >> | calltrace:raid#_init | 0 | 17 |
> >> >> | calltrace:dm_init | 0 | 17 |
> >> >> | calltrace:vhci_init | 0 | 17 |
> >> >> | calltrace:dcdrbu_init | 0 | 17 |
> >> >> | calltrace:dcdbas_init | 0 | 17 |
> >> >> | calltrace:uhid_init | 0 | 17 |
> >> >> | calltrace:vhost_net_init | 0 | 17 |
> >> >> | calltrace:dummy_stm_init | 0 | 17 |
> >> >> | calltrace:stm_heartbeat_init | 0 | 17 |
> >> >> | calltrace:binder_init | 0 | 17 |
> >> >> | calltrace:teql_init | 0 | 17 |
> >> >> | calltrace:ipip_init | 0 | 17 |
> >> >> | calltrace:ipgre_init | 0 | 17 |
> >> >> | calltrace:sit_init | 0 | 17 |
> >> >> | calltrace:ip6_tunnel_init | 0 | 17 |
> >> >> | calltrace:ip6gre_init | 0 | 17 |
> >> >> | calltrace:irlan_init | 0 | 17 |
> >> >> | calltrace:pm_qos_power_init | 0 | 17 |
> >> >> | calltrace:ubi_init | 0 | 17 |
> >> >> | calltrace:SyS_setsockopt | 0 | 2 |
> >> >> | calltrace:SyS_socketcall | 0 | 2 |
> >> >> | WARNING:at_include/linux/kref.h:#kobject_get | 0 | 4 |
> >> >> | WARNING:at_kernel/locking/lockdep.c:#__lock_acquire | 0 | 4 |
> >> >> | BUG:kernel_hang_in_test_stage | 0 | 4 |
> >> >> | calltrace:fw_cfg_sysfs_init | 0 | 7 |
> >> >> +------------------------------------------------------------+------------+------------+
> >> >>
> >> >>
> >> >>
> >> >> [ 152.680291] g_acm_ms usbip-vudc.0: failed to start g_acm_ms: -22
> >> >> [ 152.689209] usbip-vudc: probe of usbip-vudc.0 failed with error -22
> >> >> [ 152.698509] ------------[ cut here ]------------
> >> >> [ 152.709786] WARNING: CPU: 0 PID: 1 at lib/list_debug.c:60 __list_del_entry+0x4d/0x71
> >> >> [ 152.722785] list_del corruption. next->prev should be cc0230d4, but was 6b6b6b6b
> >>
> >> That's the poison value, so this is a remove-after-remove bug?
> >>
> >> >> [ 152.732828] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc1-00033-g8052ce2 #1
> >> >> [ 152.742576] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
> >> >> [ 152.754081] 00000000 00200246 c0033e78 c13c456b c0033ea4 c13e72ac c0033e90 c103fd02
> >> >> [ 152.765720] 0000003c 6b6b6b6b cc0230d4 cc02300c c0033eac c103fd41 00000009 00000000
> >> >> [ 152.776864] c0033ea4 c1f7bc14 c0033ec0 c0033ecc c13e72ac c1f7bace 0000003c c1f7bc14
> >> >> [ 152.788472] Call Trace:
> >> >> [ 152.791964] [<c13c456b>] dump_stack+0x75/0xa9
> >> >> [ 152.797835] [<c13e72ac>] ? __list_del_entry+0x4d/0x71
> >> >> [ 152.804221] [<c103fd02>] __warn+0xad/0xc4
> >> >> [ 152.809694] [<c103fd41>] warn_slowpath_fmt+0x28/0x2d
> >> >> [ 152.816501] [<c13e72ac>] __list_del_entry+0x4d/0x71
> >> >> [ 152.822946] [<c15027a8>] device_pm_remove+0x2b/0x5a
> >> >> [ 152.829560] [<c14f6317>] device_del+0x134/0x19a
> >> >> [ 152.835558] [<c14f5c38>] ? dev_set_name+0x14/0x16
> >> >> [ 152.842135] [<c14f9d7c>] platform_device_del+0x16/0x5d
> >> >> [ 152.848909] [<c2207515>] init+0xe5/0x112
> >> >> [ 152.854149] [<c2207430>] ? usbip_host_init+0xe5/0xe5
> >>
> >> What device is this? Looks like it's being removed right at init time??
> >>
> >> >> [ 152.860796] [<c21cec35>] do_one_initcall+0x7e/0x101
> >> >> [ 152.867316] [<c10564b7>] ? parse_args+0x190/0x265
> >> >> [ 152.873497] [<c21ced88>] ? kernel_init_freeable+0xd0/0x174
> >> >> [ 152.880923] [<c21ceda4>] kernel_init_freeable+0xec/0x174
> >> >> [ 152.887499] [<c1be421b>] kernel_init+0x8/0xd0
> >> >> [ 152.893658] [<c1bec3ae>] ret_from_kernel_thread+0xe/0x30
> >> >> [ 152.900759] [<c1be4213>] ? rest_init+0x111/0x111
> >> >> [ 152.907334] ---[ end trace 2003739b08d25430 ]---
> >> >> [ 152.920040] ------------[ cut here ]------------
> >>
> >> -Kees
> >>
> >>
> >> --
> >> Kees Cook
> >> Nexus Security
> >>
> >
>
>
>
> --
> Kees Cook
> Nexus Security
>
WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Kees Cook <keescook@chromium.org>
Cc: kernel test robot <xiaolong.ye@intel.com>, LKP <lkp@01.org>,
LKML <linux-kernel@vger.kernel.org>,
Rik van Riel <riel@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [list] 8052ce2d97: WARNING: CPU: 0 PID: 1 at lib/list_debug.c:60 __list_del_entry+0x4d/0x71
Date: Thu, 8 Sep 2016 02:27:12 -0700 [thread overview]
Message-ID: <20160908092712.GI3663@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAGXu5jJFNymPhRp=1Zp829-_wmT2XwzR0Lej2hy_CExQrG1Y5g@mail.gmail.com>
On Wed, Sep 07, 2016 at 10:25:58AM -0700, Kees Cook wrote:
> On Wed, Sep 7, 2016 at 12:31 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Tue, Sep 06, 2016 at 11:49:04AM -0400, Kees Cook wrote:
> >> On Sun, Sep 4, 2016 at 5:48 AM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Sun, Sep 04, 2016 at 08:35:40AM +0800, kernel test robot wrote:
> >> >>
> >> >> FYI, we noticed the following commit:
> >> >>
> >> >> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2016.08.19a
> >> >> commit 8052ce2d9771ab5a818307f3abbaf33bba82a631 ("list: Split list_add() debug checking into separate function")
> >> >>
> >> >> in testcase: boot
> >> >>
> >> >> on test machine: qemu-system-i386 -enable-kvm -m 360M
> >> >>
> >> >> caused below changes:
> >> >
> >> > Hello, Kees,
> >> >
> >> > Looks like your change might have actually found a bug. ;-)
> >> >
> >> > How would you like to handle this?
> >>
> >> We should fix the bug! :)
> >
> > But of course...
> >
> > The question is whether I am considered to be fully responsible for
> > chasing down any bugs that arise. It is looking like I am, in which
> > case I must defer this patch to the 4.10 merge window. Things are a
> > bit crazy now due to prior overcommitments...
>
> I'm going to double-check the code, but the changes I made shouldn't
> have changed any of the logical checks under CONFIG_DEBUG_LIST... just
> the manifestation of WARN vs BUG. So, since both this and the BPF RCU
> bug have appeared, something has changed... I will go digging...
So maybe the boot failures are a result of the WARN->BUG transition?
Thanx, Paul
> -Kees
>
> > Yes, yes, they will be crazy then as well, but this would be higher
> > on the priority list.
> >
> > Thanx, Paul
> >
> >> >> +------------------------------------------------------------+------------+------------+
> >> >> | | 0a33d912a3 | 8052ce2d97 |
> >> >> +------------------------------------------------------------+------------+------------+
> >> >> | boot_successes | 0 | 0 |
> >> >> | boot_failures | 211 | 18 |
> >> >> | WARNING:at_lib/list_debug.c:#__list_del_entry | 211 | 18 |
> >> >> | calltrace:init | 209 | 18 |
> >> >> | WARNING:at_lib/list_debug.c:#__list_add | 209 | |
> >> >> | calltrace:i8042_init | 209 | 17 |
> >> >> | calltrace:SyS_reboot | 192 | 4 |
> >> >> | EIP_is_at__mutex_unlock_slowpath | 29 | |
> >> >> | Kernel_panic-not_syncing:softlockup:hung_tasks | 138 | |
> >> >> | BUG:workqueue_lockup-pool_cpus=#flags=#nice=#stuck_for#s | 56 | |
> >> >> | EIP_is_at_lock_release | 7 | |
> >> >> | EIP_is_at_lock_acquire | 28 | |
> >> >> | EIP_is_at_raw_spin_unlock_irq | 30 | |
> >> >> | EIP_is_at_mutex_lock_nested | 21 | |
> >> >> | EIP_is_at_lock_is_held | 10 | |
> >> >> | EIP_is_at_device_shutdown | 3 | |
> >> >> | EIP_is_at___might_sleep | 5 | |
> >> >> | EIP_is_at__list_del_entry | 4 | 1 |
> >> >> | EIP_is_at_get_device | 1 | |
> >> >> | BUG:unable_to_handle_kernel | 7 | 5 |
> >> >> | Oops | 7 | 5 |
> >> >> | calltrace:SyS_ipc | 4 | 4 |
> >> >> | Kernel_panic-not_syncing:Fatal_exception | 7 | 5 |
> >> >> | EIP_is_at_pm_runtime_barrier | 1 | |
> >> >> | EIP_is_at_lock_acquired | 1 | |
> >> >> | EIP_is_at__might_sleep | 1 | |
> >> >> | BUG:workqueue_lockup-pool_cpus=#-#flags=#nice=#stuck_for#s | 1 | |
> >> >> | WARNING:at_arch/x86/mm/dump_pagetables.c:#note_page | 26 | |
> >> >> | calltrace:mark_rodata_ro | 26 | |
> >> >> | WARNING:at_lib/list_debug.c:#__list_add_valid | 0 | 17 |
> >> >> | calltrace:serio_handle_event | 0 | 17 |
> >> >> | calltrace:userio_misc_init | 0 | 17 |
> >> >> | calltrace:mousedev_init | 0 | 17 |
> >> >> | calltrace:evdev_init | 0 | 17 |
> >> >> | calltrace:cmos_init | 0 | 17 |
> >> >> | calltrace:test_init | 0 | 17 |
> >> >> | calltrace:pca_isa_driver_init | 0 | 17 |
> >> >> | calltrace:vim2m_init | 0 | 17 |
> >> >> | calltrace:fm_drv_init | 0 | 17 |
> >> >> | calltrace:test_power_init | 0 | 17 |
> >> >> | calltrace:raid#_init | 0 | 17 |
> >> >> | calltrace:dm_init | 0 | 17 |
> >> >> | calltrace:vhci_init | 0 | 17 |
> >> >> | calltrace:dcdrbu_init | 0 | 17 |
> >> >> | calltrace:dcdbas_init | 0 | 17 |
> >> >> | calltrace:uhid_init | 0 | 17 |
> >> >> | calltrace:vhost_net_init | 0 | 17 |
> >> >> | calltrace:dummy_stm_init | 0 | 17 |
> >> >> | calltrace:stm_heartbeat_init | 0 | 17 |
> >> >> | calltrace:binder_init | 0 | 17 |
> >> >> | calltrace:teql_init | 0 | 17 |
> >> >> | calltrace:ipip_init | 0 | 17 |
> >> >> | calltrace:ipgre_init | 0 | 17 |
> >> >> | calltrace:sit_init | 0 | 17 |
> >> >> | calltrace:ip6_tunnel_init | 0 | 17 |
> >> >> | calltrace:ip6gre_init | 0 | 17 |
> >> >> | calltrace:irlan_init | 0 | 17 |
> >> >> | calltrace:pm_qos_power_init | 0 | 17 |
> >> >> | calltrace:ubi_init | 0 | 17 |
> >> >> | calltrace:SyS_setsockopt | 0 | 2 |
> >> >> | calltrace:SyS_socketcall | 0 | 2 |
> >> >> | WARNING:at_include/linux/kref.h:#kobject_get | 0 | 4 |
> >> >> | WARNING:at_kernel/locking/lockdep.c:#__lock_acquire | 0 | 4 |
> >> >> | BUG:kernel_hang_in_test_stage | 0 | 4 |
> >> >> | calltrace:fw_cfg_sysfs_init | 0 | 7 |
> >> >> +------------------------------------------------------------+------------+------------+
> >> >>
> >> >>
> >> >>
> >> >> [ 152.680291] g_acm_ms usbip-vudc.0: failed to start g_acm_ms: -22
> >> >> [ 152.689209] usbip-vudc: probe of usbip-vudc.0 failed with error -22
> >> >> [ 152.698509] ------------[ cut here ]------------
> >> >> [ 152.709786] WARNING: CPU: 0 PID: 1 at lib/list_debug.c:60 __list_del_entry+0x4d/0x71
> >> >> [ 152.722785] list_del corruption. next->prev should be cc0230d4, but was 6b6b6b6b
> >>
> >> That's the poison value, so this is a remove-after-remove bug?
> >>
> >> >> [ 152.732828] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc1-00033-g8052ce2 #1
> >> >> [ 152.742576] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
> >> >> [ 152.754081] 00000000 00200246 c0033e78 c13c456b c0033ea4 c13e72ac c0033e90 c103fd02
> >> >> [ 152.765720] 0000003c 6b6b6b6b cc0230d4 cc02300c c0033eac c103fd41 00000009 00000000
> >> >> [ 152.776864] c0033ea4 c1f7bc14 c0033ec0 c0033ecc c13e72ac c1f7bace 0000003c c1f7bc14
> >> >> [ 152.788472] Call Trace:
> >> >> [ 152.791964] [<c13c456b>] dump_stack+0x75/0xa9
> >> >> [ 152.797835] [<c13e72ac>] ? __list_del_entry+0x4d/0x71
> >> >> [ 152.804221] [<c103fd02>] __warn+0xad/0xc4
> >> >> [ 152.809694] [<c103fd41>] warn_slowpath_fmt+0x28/0x2d
> >> >> [ 152.816501] [<c13e72ac>] __list_del_entry+0x4d/0x71
> >> >> [ 152.822946] [<c15027a8>] device_pm_remove+0x2b/0x5a
> >> >> [ 152.829560] [<c14f6317>] device_del+0x134/0x19a
> >> >> [ 152.835558] [<c14f5c38>] ? dev_set_name+0x14/0x16
> >> >> [ 152.842135] [<c14f9d7c>] platform_device_del+0x16/0x5d
> >> >> [ 152.848909] [<c2207515>] init+0xe5/0x112
> >> >> [ 152.854149] [<c2207430>] ? usbip_host_init+0xe5/0xe5
> >>
> >> What device is this? Looks like it's being removed right at init time??
> >>
> >> >> [ 152.860796] [<c21cec35>] do_one_initcall+0x7e/0x101
> >> >> [ 152.867316] [<c10564b7>] ? parse_args+0x190/0x265
> >> >> [ 152.873497] [<c21ced88>] ? kernel_init_freeable+0xd0/0x174
> >> >> [ 152.880923] [<c21ceda4>] kernel_init_freeable+0xec/0x174
> >> >> [ 152.887499] [<c1be421b>] kernel_init+0x8/0xd0
> >> >> [ 152.893658] [<c1bec3ae>] ret_from_kernel_thread+0xe/0x30
> >> >> [ 152.900759] [<c1be4213>] ? rest_init+0x111/0x111
> >> >> [ 152.907334] ---[ end trace 2003739b08d25430 ]---
> >> >> [ 152.920040] ------------[ cut here ]------------
> >>
> >> -Kees
> >>
> >>
> >> --
> >> Kees Cook
> >> Nexus Security
> >>
> >
>
>
>
> --
> Kees Cook
> Nexus Security
>
next prev parent reply other threads:[~2016-09-08 9:27 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-04 0:35 [list] 8052ce2d97: WARNING: CPU: 0 PID: 1 at lib/list_debug.c:60 __list_del_entry+0x4d/0x71 kernel test robot
2016-09-04 9:48 ` Paul E. McKenney
2016-09-04 9:48 ` Paul E. McKenney
2016-09-06 15:49 ` Kees Cook
2016-09-06 15:49 ` Kees Cook
2016-09-07 7:31 ` Paul E. McKenney
2016-09-07 7:31 ` Paul E. McKenney
2016-09-07 17:25 ` Kees Cook
2016-09-07 17:25 ` Kees Cook
2016-09-08 9:27 ` Paul E. McKenney [this message]
2016-09-08 9:27 ` Paul E. McKenney
2016-09-08 15:52 ` Kees Cook
2016-09-08 15:52 ` Kees Cook
2016-10-22 6:08 ` Kees Cook
2016-10-22 6:08 ` Kees Cook
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160908092712.GI3663@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=lkp@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.