From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965689AbcIHJ1U (ORCPT ); Thu, 8 Sep 2016 05:27:20 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:55443 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S937312AbcIHJ1Q (ORCPT ); Thu, 8 Sep 2016 05:27:16 -0400 X-IBM-Helo: d03dlp01.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com Date: Thu, 8 Sep 2016 02:27:12 -0700 From: "Paul E. McKenney" To: Kees Cook Cc: kernel test robot , LKP , LKML , Rik van Riel , Steven Rostedt Subject: Re: [list] 8052ce2d97: WARNING: CPU: 0 PID: 1 at lib/list_debug.c:60 __list_del_entry+0x4d/0x71 Reply-To: paulmck@linux.vnet.ibm.com References: <57cb6c5c.ht1t4TgNR/5nHu0Z%xiaolong.ye@intel.com> <20160904094851.GM3663@linux.vnet.ibm.com> <20160907073153.GR3663@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16090809-0024-0000-0000-00001480AD54 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005727; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000185; SDB=6.00755278; UDB=6.00357576; IPR=6.00528193; BA=6.00004702; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012618; XFM=3.00000011; UTC=2016-09-08 09:27:13 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16090809-0025-0000-0000-00004445DA5B Message-Id: <20160908092712.GI3663@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-09-08_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1609080137 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 07, 2016 at 10:25:58AM -0700, Kees Cook wrote: > On Wed, Sep 7, 2016 at 12:31 AM, Paul E. McKenney > wrote: > > On Tue, Sep 06, 2016 at 11:49:04AM -0400, Kees Cook wrote: > >> On Sun, Sep 4, 2016 at 5:48 AM, Paul E. McKenney > >> wrote: > >> > On Sun, Sep 04, 2016 at 08:35:40AM +0800, kernel test robot wrote: > >> >> > >> >> FYI, we noticed the following commit: > >> >> > >> >> https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2016.08.19a > >> >> commit 8052ce2d9771ab5a818307f3abbaf33bba82a631 ("list: Split list_add() debug checking into separate function") > >> >> > >> >> in testcase: boot > >> >> > >> >> on test machine: qemu-system-i386 -enable-kvm -m 360M > >> >> > >> >> caused below changes: > >> > > >> > Hello, Kees, > >> > > >> > Looks like your change might have actually found a bug. ;-) > >> > > >> > How would you like to handle this? > >> > >> We should fix the bug! :) > > > > But of course... > > > > The question is whether I am considered to be fully responsible for > > chasing down any bugs that arise. It is looking like I am, in which > > case I must defer this patch to the 4.10 merge window. Things are a > > bit crazy now due to prior overcommitments... > > I'm going to double-check the code, but the changes I made shouldn't > have changed any of the logical checks under CONFIG_DEBUG_LIST... just > the manifestation of WARN vs BUG. So, since both this and the BPF RCU > bug have appeared, something has changed... I will go digging... So maybe the boot failures are a result of the WARN->BUG transition? Thanx, Paul > -Kees > > > Yes, yes, they will be crazy then as well, but this would be higher > > on the priority list. > > > > Thanx, Paul > > > >> >> +------------------------------------------------------------+------------+------------+ > >> >> | | 0a33d912a3 | 8052ce2d97 | > >> >> +------------------------------------------------------------+------------+------------+ > >> >> | boot_successes | 0 | 0 | > >> >> | boot_failures | 211 | 18 | > >> >> | WARNING:at_lib/list_debug.c:#__list_del_entry | 211 | 18 | > >> >> | calltrace:init | 209 | 18 | > >> >> | WARNING:at_lib/list_debug.c:#__list_add | 209 | | > >> >> | calltrace:i8042_init | 209 | 17 | > >> >> | calltrace:SyS_reboot | 192 | 4 | > >> >> | EIP_is_at__mutex_unlock_slowpath | 29 | | > >> >> | Kernel_panic-not_syncing:softlockup:hung_tasks | 138 | | > >> >> | BUG:workqueue_lockup-pool_cpus=#flags=#nice=#stuck_for#s | 56 | | > >> >> | EIP_is_at_lock_release | 7 | | > >> >> | EIP_is_at_lock_acquire | 28 | | > >> >> | EIP_is_at_raw_spin_unlock_irq | 30 | | > >> >> | EIP_is_at_mutex_lock_nested | 21 | | > >> >> | EIP_is_at_lock_is_held | 10 | | > >> >> | EIP_is_at_device_shutdown | 3 | | > >> >> | EIP_is_at___might_sleep | 5 | | > >> >> | EIP_is_at__list_del_entry | 4 | 1 | > >> >> | EIP_is_at_get_device | 1 | | > >> >> | BUG:unable_to_handle_kernel | 7 | 5 | > >> >> | Oops | 7 | 5 | > >> >> | calltrace:SyS_ipc | 4 | 4 | > >> >> | Kernel_panic-not_syncing:Fatal_exception | 7 | 5 | > >> >> | EIP_is_at_pm_runtime_barrier | 1 | | > >> >> | EIP_is_at_lock_acquired | 1 | | > >> >> | EIP_is_at__might_sleep | 1 | | > >> >> | BUG:workqueue_lockup-pool_cpus=#-#flags=#nice=#stuck_for#s | 1 | | > >> >> | WARNING:at_arch/x86/mm/dump_pagetables.c:#note_page | 26 | | > >> >> | calltrace:mark_rodata_ro | 26 | | > >> >> | WARNING:at_lib/list_debug.c:#__list_add_valid | 0 | 17 | > >> >> | calltrace:serio_handle_event | 0 | 17 | > >> >> | calltrace:userio_misc_init | 0 | 17 | > >> >> | calltrace:mousedev_init | 0 | 17 | > >> >> | calltrace:evdev_init | 0 | 17 | > >> >> | calltrace:cmos_init | 0 | 17 | > >> >> | calltrace:test_init | 0 | 17 | > >> >> | calltrace:pca_isa_driver_init | 0 | 17 | > >> >> | calltrace:vim2m_init | 0 | 17 | > >> >> | calltrace:fm_drv_init | 0 | 17 | > >> >> | calltrace:test_power_init | 0 | 17 | > >> >> | calltrace:raid#_init | 0 | 17 | > >> >> | calltrace:dm_init | 0 | 17 | > >> >> | calltrace:vhci_init | 0 | 17 | > >> >> | calltrace:dcdrbu_init | 0 | 17 | > >> >> | calltrace:dcdbas_init | 0 | 17 | > >> >> | calltrace:uhid_init | 0 | 17 | > >> >> | calltrace:vhost_net_init | 0 | 17 | > >> >> | calltrace:dummy_stm_init | 0 | 17 | > >> >> | calltrace:stm_heartbeat_init | 0 | 17 | > >> >> | calltrace:binder_init | 0 | 17 | > >> >> | calltrace:teql_init | 0 | 17 | > >> >> | calltrace:ipip_init | 0 | 17 | > >> >> | calltrace:ipgre_init | 0 | 17 | > >> >> | calltrace:sit_init | 0 | 17 | > >> >> | calltrace:ip6_tunnel_init | 0 | 17 | > >> >> | calltrace:ip6gre_init | 0 | 17 | > >> >> | calltrace:irlan_init | 0 | 17 | > >> >> | calltrace:pm_qos_power_init | 0 | 17 | > >> >> | calltrace:ubi_init | 0 | 17 | > >> >> | calltrace:SyS_setsockopt | 0 | 2 | > >> >> | calltrace:SyS_socketcall | 0 | 2 | > >> >> | WARNING:at_include/linux/kref.h:#kobject_get | 0 | 4 | > >> >> | WARNING:at_kernel/locking/lockdep.c:#__lock_acquire | 0 | 4 | > >> >> | BUG:kernel_hang_in_test_stage | 0 | 4 | > >> >> | calltrace:fw_cfg_sysfs_init | 0 | 7 | > >> >> +------------------------------------------------------------+------------+------------+ > >> >> > >> >> > >> >> > >> >> [ 152.680291] g_acm_ms usbip-vudc.0: failed to start g_acm_ms: -22 > >> >> [ 152.689209] usbip-vudc: probe of usbip-vudc.0 failed with error -22 > >> >> [ 152.698509] ------------[ cut here ]------------ > >> >> [ 152.709786] WARNING: CPU: 0 PID: 1 at lib/list_debug.c:60 __list_del_entry+0x4d/0x71 > >> >> [ 152.722785] list_del corruption. next->prev should be cc0230d4, but was 6b6b6b6b > >> > >> That's the poison value, so this is a remove-after-remove bug? > >> > >> >> [ 152.732828] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc1-00033-g8052ce2 #1 > >> >> [ 152.742576] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 > >> >> [ 152.754081] 00000000 00200246 c0033e78 c13c456b c0033ea4 c13e72ac c0033e90 c103fd02 > >> >> [ 152.765720] 0000003c 6b6b6b6b cc0230d4 cc02300c c0033eac c103fd41 00000009 00000000 > >> >> [ 152.776864] c0033ea4 c1f7bc14 c0033ec0 c0033ecc c13e72ac c1f7bace 0000003c c1f7bc14 > >> >> [ 152.788472] Call Trace: > >> >> [ 152.791964] [] dump_stack+0x75/0xa9 > >> >> [ 152.797835] [] ? __list_del_entry+0x4d/0x71 > >> >> [ 152.804221] [] __warn+0xad/0xc4 > >> >> [ 152.809694] [] warn_slowpath_fmt+0x28/0x2d > >> >> [ 152.816501] [] __list_del_entry+0x4d/0x71 > >> >> [ 152.822946] [] device_pm_remove+0x2b/0x5a > >> >> [ 152.829560] [] device_del+0x134/0x19a > >> >> [ 152.835558] [] ? dev_set_name+0x14/0x16 > >> >> [ 152.842135] [] platform_device_del+0x16/0x5d > >> >> [ 152.848909] [] init+0xe5/0x112 > >> >> [ 152.854149] [] ? usbip_host_init+0xe5/0xe5 > >> > >> What device is this? Looks like it's being removed right at init time?? > >> > >> >> [ 152.860796] [] do_one_initcall+0x7e/0x101 > >> >> [ 152.867316] [] ? parse_args+0x190/0x265 > >> >> [ 152.873497] [] ? kernel_init_freeable+0xd0/0x174 > >> >> [ 152.880923] [] kernel_init_freeable+0xec/0x174 > >> >> [ 152.887499] [] kernel_init+0x8/0xd0 > >> >> [ 152.893658] [] ret_from_kernel_thread+0xe/0x30 > >> >> [ 152.900759] [] ? rest_init+0x111/0x111 > >> >> [ 152.907334] ---[ end trace 2003739b08d25430 ]--- > >> >> [ 152.920040] ------------[ cut here ]------------ > >> > >> -Kees > >> > >> > >> -- > >> Kees Cook > >> Nexus Security > >> > > > > > > -- > Kees Cook > Nexus Security >