* WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() @ 2011-10-15 20:12 Sergey Senozhatsky 2011-10-15 21:42 ` David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2011-10-15 20:12 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, Andrew Morton Hello, 3.1-rc9 [10172.218213] ------------[ cut here ]------------ [10172.218233] WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() [10172.218242] Hardware name: Aspire 5741G [10172.218248] Modules linked in: ipv6 usb_storage uas microcode snd_hda_codec_hdmi snd_hda_codec_realtek broadcom tg3 snd_hda_intel snd_hda_codec snd_pcm snd_timer snd rndis_host cdc_ether usbnet evdev psmouse soundcore pcspkr mii snd_page_alloc libphy ac battery wmi button ehci_hcd sr_mod cdrom usbcore sd_mod ahci [10172.218330] Pid: 22953, comm: kworker/0:2 Not tainted 3.1.0-rc9-dbg-00681-gec325b2 #730 [10172.218335] Call Trace: [10172.218346] [<ffffffff8103e7c8>] warn_slowpath_common+0x7e/0x96 [10172.218353] [<ffffffff8103e7f5>] warn_slowpath_null+0x15/0x17 [10172.218361] [<ffffffff8106fee5>] __lock_acquire+0x168/0x164b [10172.218370] [<ffffffff81034645>] ? find_busiest_group+0x7b6/0x941 [10172.218381] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 [10172.218389] [<ffffffff8107197e>] lock_acquire+0x138/0x1ac [10172.218397] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 [10172.218404] [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52 [10172.218414] [<ffffffff8148fb49>] _raw_spin_lock_nested+0x3a/0x49 [10172.218421] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 [10172.218428] [<ffffffff8148fabe>] ? _raw_spin_lock+0x3e/0x45 [10172.218435] [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52 [10172.218442] [<ffffffff8102a5e3>] double_rq_lock+0x4d/0x52 [10172.218449] [<ffffffff810349cc>] load_balance+0x1fc/0x769 [10172.218458] [<ffffffff810075c5>] ? native_sched_clock+0x38/0x65 [10172.218466] [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d [10172.218474] [<ffffffff8148caf5>] __schedule+0x3d3/0xa2d [10172.218480] [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d [10172.218490] [<ffffffff8104db06>] ? add_timer_on+0xd/0x196 [10172.218497] [<ffffffff8148fc02>] ? _raw_spin_lock_irq+0x4a/0x51 [10172.218505] [<ffffffff8105907b>] ? process_one_work+0x3ed/0x54c [10172.218512] [<ffffffff81059126>] ? process_one_work+0x498/0x54c [10172.218518] [<ffffffff81058e1b>] ? process_one_work+0x18d/0x54c [10172.218526] [<ffffffff814902d0>] ? _raw_spin_unlock_irq+0x28/0x56 [10172.218533] [<ffffffff81033950>] ? get_parent_ip+0xe/0x3e [10172.218540] [<ffffffff8148d26e>] schedule+0x55/0x57 [10172.218547] [<ffffffff8105970f>] worker_thread+0x217/0x21c [10172.218554] [<ffffffff810594f8>] ? manage_workers.isra.21+0x16c/0x16c [10172.218564] [<ffffffff8105d4de>] kthread+0x9a/0xa2 [10172.218573] [<ffffffff81497984>] kernel_thread_helper+0x4/0x10 [10172.218580] [<ffffffff8102d6d2>] ? finish_task_switch+0x76/0xf3 [10172.218587] [<ffffffff81490778>] ? retint_restore_args+0x13/0x13 [10172.218595] [<ffffffff8105d444>] ? __init_kthread_worker+0x53/0x53 [10172.218602] [<ffffffff81497980>] ? gs_change+0x13/0x13 [10172.218607] ---[ end trace 9d11d6b5e4b96730 ]--- Really not sure what's going on and how to reproduce. Sergey ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-15 20:12 WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() Sergey Senozhatsky @ 2011-10-15 21:42 ` David Rientjes 2011-10-15 22:23 ` Borislav Petkov 0 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2011-10-15 21:42 UTC (permalink / raw) To: Sergey Senozhatsky, Tejun Heo, Tejun Heo Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Andrew Morton On Sat, 15 Oct 2011, Sergey Senozhatsky wrote: > [10172.218213] ------------[ cut here ]------------ > [10172.218233] WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() > [10172.218242] Hardware name: Aspire 5741G > [10172.218248] Modules linked in: ipv6 usb_storage uas microcode snd_hda_codec_hdmi snd_hda_codec_realtek broadcom tg3 snd_hda_intel snd_hda_codec snd_pcm snd_timer snd rndis_host cdc_ether usbnet evdev psmouse soundcore pcspkr mii > snd_page_alloc libphy ac battery wmi button ehci_hcd sr_mod cdrom usbcore sd_mod ahci > [10172.218330] Pid: 22953, comm: kworker/0:2 Not tainted 3.1.0-rc9-dbg-00681-gec325b2 #730 > [10172.218335] Call Trace: > [10172.218346] [<ffffffff8103e7c8>] warn_slowpath_common+0x7e/0x96 > [10172.218353] [<ffffffff8103e7f5>] warn_slowpath_null+0x15/0x17 > [10172.218361] [<ffffffff8106fee5>] __lock_acquire+0x168/0x164b > [10172.218370] [<ffffffff81034645>] ? find_busiest_group+0x7b6/0x941 > [10172.218381] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 > [10172.218389] [<ffffffff8107197e>] lock_acquire+0x138/0x1ac > [10172.218397] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 > [10172.218404] [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52 > [10172.218414] [<ffffffff8148fb49>] _raw_spin_lock_nested+0x3a/0x49 > [10172.218421] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 > [10172.218428] [<ffffffff8148fabe>] ? _raw_spin_lock+0x3e/0x45 > [10172.218435] [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52 > [10172.218442] [<ffffffff8102a5e3>] double_rq_lock+0x4d/0x52 > [10172.218449] [<ffffffff810349cc>] load_balance+0x1fc/0x769 > [10172.218458] [<ffffffff810075c5>] ? native_sched_clock+0x38/0x65 > [10172.218466] [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d > [10172.218474] [<ffffffff8148caf5>] __schedule+0x3d3/0xa2d > [10172.218480] [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d > [10172.218490] [<ffffffff8104db06>] ? add_timer_on+0xd/0x196 > [10172.218497] [<ffffffff8148fc02>] ? _raw_spin_lock_irq+0x4a/0x51 > [10172.218505] [<ffffffff8105907b>] ? process_one_work+0x3ed/0x54c > [10172.218512] [<ffffffff81059126>] ? process_one_work+0x498/0x54c > [10172.218518] [<ffffffff81058e1b>] ? process_one_work+0x18d/0x54c > [10172.218526] [<ffffffff814902d0>] ? _raw_spin_unlock_irq+0x28/0x56 > [10172.218533] [<ffffffff81033950>] ? get_parent_ip+0xe/0x3e > [10172.218540] [<ffffffff8148d26e>] schedule+0x55/0x57 > [10172.218547] [<ffffffff8105970f>] worker_thread+0x217/0x21c > [10172.218554] [<ffffffff810594f8>] ? manage_workers.isra.21+0x16c/0x16c > [10172.218564] [<ffffffff8105d4de>] kthread+0x9a/0xa2 > [10172.218573] [<ffffffff81497984>] kernel_thread_helper+0x4/0x10 > [10172.218580] [<ffffffff8102d6d2>] ? finish_task_switch+0x76/0xf3 > [10172.218587] [<ffffffff81490778>] ? retint_restore_args+0x13/0x13 > [10172.218595] [<ffffffff8105d444>] ? __init_kthread_worker+0x53/0x53 > [10172.218602] [<ffffffff81497980>] ? gs_change+0x13/0x13 > [10172.218607] ---[ end trace 9d11d6b5e4b96730 ]--- I think this is a problem with lockdep itself, could you try reverting f59de8992aa6 ("lockdep: Clear whole lockdep_map on initialization") if this reliably happens everytime you reboot (lockdep will only emit this once and then will suppress future warnings until the next boot)? I think the new memset() is inadvertently clearing the name for double_unlock_balance(). ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-15 21:42 ` David Rientjes @ 2011-10-15 22:23 ` Borislav Petkov 2011-10-15 22:32 ` David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: Borislav Petkov @ 2011-10-15 22:23 UTC (permalink / raw) To: David Rientjes Cc: Sergey Senozhatsky, Tejun Heo, Tejun Heo, Peter Zijlstra, Ingo Molnar, linux-kernel, Andrew Morton On Sat, Oct 15, 2011 at 02:42:14PM -0700, David Rientjes wrote: > On Sat, 15 Oct 2011, Sergey Senozhatsky wrote: > > > [10172.218213] ------------[ cut here ]------------ > > [10172.218233] WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() > > [10172.218242] Hardware name: Aspire 5741G > > [10172.218248] Modules linked in: ipv6 usb_storage uas microcode snd_hda_codec_hdmi snd_hda_codec_realtek broadcom tg3 snd_hda_intel snd_hda_codec snd_pcm snd_timer snd rndis_host cdc_ether usbnet evdev psmouse soundcore pcspkr mii > > snd_page_alloc libphy ac battery wmi button ehci_hcd sr_mod cdrom usbcore sd_mod ahci > > [10172.218330] Pid: 22953, comm: kworker/0:2 Not tainted 3.1.0-rc9-dbg-00681-gec325b2 #730 > > [10172.218335] Call Trace: > > [10172.218346] [<ffffffff8103e7c8>] warn_slowpath_common+0x7e/0x96 > > [10172.218353] [<ffffffff8103e7f5>] warn_slowpath_null+0x15/0x17 > > [10172.218361] [<ffffffff8106fee5>] __lock_acquire+0x168/0x164b > > [10172.218370] [<ffffffff81034645>] ? find_busiest_group+0x7b6/0x941 > > [10172.218381] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 > > [10172.218389] [<ffffffff8107197e>] lock_acquire+0x138/0x1ac > > [10172.218397] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 > > [10172.218404] [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52 > > [10172.218414] [<ffffffff8148fb49>] _raw_spin_lock_nested+0x3a/0x49 > > [10172.218421] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 > > [10172.218428] [<ffffffff8148fabe>] ? _raw_spin_lock+0x3e/0x45 > > [10172.218435] [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52 > > [10172.218442] [<ffffffff8102a5e3>] double_rq_lock+0x4d/0x52 > > [10172.218449] [<ffffffff810349cc>] load_balance+0x1fc/0x769 > > [10172.218458] [<ffffffff810075c5>] ? native_sched_clock+0x38/0x65 > > [10172.218466] [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d > > [10172.218474] [<ffffffff8148caf5>] __schedule+0x3d3/0xa2d > > [10172.218480] [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d > > [10172.218490] [<ffffffff8104db06>] ? add_timer_on+0xd/0x196 > > [10172.218497] [<ffffffff8148fc02>] ? _raw_spin_lock_irq+0x4a/0x51 > > [10172.218505] [<ffffffff8105907b>] ? process_one_work+0x3ed/0x54c > > [10172.218512] [<ffffffff81059126>] ? process_one_work+0x498/0x54c > > [10172.218518] [<ffffffff81058e1b>] ? process_one_work+0x18d/0x54c > > [10172.218526] [<ffffffff814902d0>] ? _raw_spin_unlock_irq+0x28/0x56 > > [10172.218533] [<ffffffff81033950>] ? get_parent_ip+0xe/0x3e > > [10172.218540] [<ffffffff8148d26e>] schedule+0x55/0x57 > > [10172.218547] [<ffffffff8105970f>] worker_thread+0x217/0x21c > > [10172.218554] [<ffffffff810594f8>] ? manage_workers.isra.21+0x16c/0x16c > > [10172.218564] [<ffffffff8105d4de>] kthread+0x9a/0xa2 > > [10172.218573] [<ffffffff81497984>] kernel_thread_helper+0x4/0x10 > > [10172.218580] [<ffffffff8102d6d2>] ? finish_task_switch+0x76/0xf3 > > [10172.218587] [<ffffffff81490778>] ? retint_restore_args+0x13/0x13 > > [10172.218595] [<ffffffff8105d444>] ? __init_kthread_worker+0x53/0x53 > > [10172.218602] [<ffffffff81497980>] ? gs_change+0x13/0x13 > > [10172.218607] ---[ end trace 9d11d6b5e4b96730 ]--- > > I think this is a problem with lockdep itself, could you try reverting > f59de8992aa6 ("lockdep: Clear whole lockdep_map on initialization") if > this reliably happens everytime you reboot (lockdep will only emit this > once and then will suppress future warnings until the next boot)? > > I think the new memset() is inadvertently clearing the name for > double_unlock_balance(). Great, so I'm not the only one seeing the above: http://marc.info/?l=linux-kernel&m=131468805610527 Due to it being very hard to reproduce, we dismissed it then as a possible hw corruption. But yeah, it looks like I have triggered it on -rc9 too, just the other day. Oh, and I see -rc6 and -rc8 warnings in the logs too. Ok, correction, not that hard to trigger. Oct 11 09:08:11 liondog kernel: [15367.473110] ------------[ cut here ]------------ Oct 11 09:08:11 liondog kernel: [15367.473135] WARNING: at kernel/lockdep.c:690 __lock_acquire+0x173/0x17b5() Oct 11 09:08:11 liondog kernel: [15367.473145] Hardware name: System Product Name Oct 11 09:08:11 liondog kernel: [15367.473152] Modules linked in: cryptd aes_x86_64 aes_generic nls_iso8859_15 nls_cp437 tun cpufreq_powersave cpufreq_userspace cpufreq_conservative powernow_k8 mperf cpufreq_stats binfmt_misc fuse dm_crypt dm_mod ipv6 kvm_amd kvm vfat fat radeon 8250_pnp 8250 ttm drm_kms_helper cfbcopyarea edac_core serial_core cfbimgblt cfbfillrect k10temp Oct 11 09:08:11 liondog kernel: [15367.473256] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0-rc9-00005-g538d2882213e #5 Oct 11 09:08:11 liondog kernel: [15367.473264] Call Trace: Oct 11 09:08:11 liondog kernel: [15367.473270] <IRQ> [<ffffffff810367ff>] warn_slowpath_common+0x83/0x9b Oct 11 09:08:11 liondog kernel: [15367.473298] [<ffffffff81036831>] warn_slowpath_null+0x1a/0x1c Oct 11 09:08:11 liondog kernel: [15367.473309] [<ffffffff810691fc>] __lock_acquire+0x173/0x17b5 Oct 11 09:08:11 liondog kernel: [15367.473321] [<ffffffff8106a82c>] ? __lock_acquire+0x17a3/0x17b5 Oct 11 09:08:11 liondog kernel: [15367.473334] [<ffffffff81029c53>] ? double_rq_lock+0x4d/0x52 Oct 11 09:08:11 liondog kernel: [15367.473346] [<ffffffff8106ae67>] lock_acquire+0x154/0x198 Oct 11 09:08:11 liondog kernel: [15367.473356] [<ffffffff81029c53>] ? double_rq_lock+0x4d/0x52 Oct 11 09:08:11 liondog kernel: [15367.473368] [<ffffffff810659de>] ? put_lock_stats.isra.15+0xe/0x29 Oct 11 09:08:11 liondog kernel: [15367.473382] [<ffffffff813c1c49>] _raw_spin_lock_nested+0x44/0x79 Oct 11 09:08:11 liondog kernel: [15367.473392] [<ffffffff81029c53>] ? double_rq_lock+0x4d/0x52 Oct 11 09:08:11 liondog kernel: [15367.473403] [<ffffffff813c1b84>] ? _raw_spin_lock+0x6c/0x73 Oct 11 09:08:11 liondog kernel: [15367.473413] [<ffffffff81029c34>] ? double_rq_lock+0x2e/0x52 Oct 11 09:08:11 liondog kernel: [15367.473423] [<ffffffff81029c53>] double_rq_lock+0x4d/0x52 Oct 11 09:08:11 liondog kernel: [15367.473434] [<ffffffff8102e7d5>] load_balance+0x1b7/0x4f7 Oct 11 09:08:11 liondog kernel: [15367.473447] [<ffffffff8102ec79>] rebalance_domains+0x164/0x1f9 Oct 11 09:08:11 liondog kernel: [15367.473458] [<ffffffff8102eb15>] ? load_balance+0x4f7/0x4f7 Oct 11 09:08:11 liondog kernel: [15367.473470] [<ffffffff8102edcb>] run_rebalance_domains+0xbd/0x12a Oct 11 09:08:11 liondog kernel: [15367.473487] [<ffffffff8103d180>] __do_softirq+0x165/0x2eb Oct 11 09:08:11 liondog kernel: [15367.473499] [<ffffffff81070ff4>] ? generic_smp_call_function_single_interrupt+0x9f/0xd8 Oct 11 09:08:11 liondog kernel: [15367.473512] [<ffffffff813c472c>] call_softirq+0x1c/0x30 Oct 11 09:08:11 liondog kernel: [15367.473525] [<ffffffff810036bb>] do_softirq+0x3d/0x86 Oct 11 09:08:11 liondog kernel: [15367.473535] [<ffffffff8103d561>] irq_exit+0x53/0xbd Oct 11 09:08:11 liondog kernel: [15367.473548] [<ffffffff81018155>] smp_call_function_single_interrupt+0x34/0x37 Oct 11 09:08:11 liondog kernel: [15367.473560] [<ffffffff813c41b0>] call_function_single_interrupt+0x70/0x80 Oct 11 09:08:11 liondog kernel: [15367.473567] <EOI> [<ffffffff8105b7f3>] ? local_clock+0xf/0x3b Oct 11 09:08:11 liondog kernel: [15367.473586] [<ffffffff8105b7f3>] ? local_clock+0xf/0x3b Oct 11 09:08:11 liondog kernel: [15367.473598] [<ffffffff810090b7>] ? default_idle+0xf1/0x1fd Oct 11 09:08:11 liondog kernel: [15367.473610] [<ffffffff810090b5>] ? default_idle+0xef/0x1fd Oct 11 09:08:11 liondog kernel: [15367.473621] [<ffffffff8100931a>] amd_e400_idle+0xc4/0xe7 Oct 11 09:08:11 liondog kernel: [15367.473632] [<ffffffff8100074c>] cpu_idle+0x67/0xbe Oct 11 09:08:11 liondog kernel: [15367.473645] [<ffffffff813b54af>] start_secondary+0x1ad/0x1b2 Oct 11 09:08:11 liondog kernel: [15367.473655] ---[ end trace 63070f7e22365bb6 ]--- -- Regards/Gruss, Boris. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-15 22:23 ` Borislav Petkov @ 2011-10-15 22:32 ` David Rientjes 2011-10-16 5:09 ` Sergey Senozhatsky 2011-10-20 18:39 ` Borislav Petkov 0 siblings, 2 replies; 26+ messages in thread From: David Rientjes @ 2011-10-15 22:32 UTC (permalink / raw) To: Borislav Petkov, Sergey Senozhatsky, Tejun Heo, Tejun Heo, Peter Zijlstra, Ingo Molnar, linux-kernel, Andrew Morton On Sun, 16 Oct 2011, Borislav Petkov wrote: > > I think this is a problem with lockdep itself, could you try reverting > > f59de8992aa6 ("lockdep: Clear whole lockdep_map on initialization") if > > this reliably happens everytime you reboot (lockdep will only emit this > > once and then will suppress future warnings until the next boot)? > > > > I think the new memset() is inadvertently clearing the name for > > double_unlock_balance(). > > Great, > > so I'm not the only one seeing the above: > http://marc.info/?l=linux-kernel&m=131468805610527 > > Due to it being very hard to reproduce, we dismissed it then as a > possible hw corruption. > > But yeah, it looks like I have triggered it on -rc9 too, just the > other day. Oh, and I see -rc6 and -rc8 warnings in the logs too. Ok, > correction, not that hard to trigger. > Could you try to revert f59de8992aa6 ("lockdep: Clear whole lockdep_map on initialization") with this patch and see if it helps? Thanks. --- diff --git a/kernel/lockdep.c b/kernel/lockdep.c --- a/kernel/lockdep.c +++ b/kernel/lockdep.c @@ -2874,7 +2874,10 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this, void lockdep_init_map(struct lockdep_map *lock, const char *name, struct lock_class_key *key, int subclass) { - memset(lock, 0, sizeof(*lock)); + int i; + + for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++) + lock->class_cache[i] = NULL; #ifdef CONFIG_LOCK_STAT lock->cpu = raw_smp_processor_id(); ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-15 22:32 ` David Rientjes @ 2011-10-16 5:09 ` Sergey Senozhatsky 2011-10-20 18:39 ` Borislav Petkov 1 sibling, 0 replies; 26+ messages in thread From: Sergey Senozhatsky @ 2011-10-16 5:09 UTC (permalink / raw) To: David Rientjes Cc: Borislav Petkov, Tejun Heo, Tejun Heo, Peter Zijlstra, Ingo Molnar, linux-kernel, Andrew Morton On (10/15/11 15:32), David Rientjes wrote: > > > I think this is a problem with lockdep itself, could you try reverting > > > f59de8992aa6 ("lockdep: Clear whole lockdep_map on initialization") if > > > this reliably happens everytime you reboot (lockdep will only emit this > > > once and then will suppress future warnings until the next boot)? > > > > > > I think the new memset() is inadvertently clearing the name for > > > double_unlock_balance(). > > > > Great, > > > > so I'm not the only one seeing the above: > > http://marc.info/?l=linux-kernel&m=131468805610527 > > > > Due to it being very hard to reproduce, we dismissed it then as a > > possible hw corruption. > > > > But yeah, it looks like I have triggered it on -rc9 too, just the > > other day. Oh, and I see -rc6 and -rc8 warnings in the logs too. Ok, > > correction, not that hard to trigger. > > > > Could you try to revert f59de8992aa6 ("lockdep: Clear whole lockdep_map on > initialization") with this patch and see if it helps? Thanks. Sure, I'd love to and will do, it's just I'm not sure I can easily reproduce it. > --- > diff --git a/kernel/lockdep.c b/kernel/lockdep.c > --- a/kernel/lockdep.c > +++ b/kernel/lockdep.c > @@ -2874,7 +2874,10 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this, > void lockdep_init_map(struct lockdep_map *lock, const char *name, > struct lock_class_key *key, int subclass) > { > - memset(lock, 0, sizeof(*lock)); > + int i; > + > + for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++) > + lock->class_cache[i] = NULL; > > #ifdef CONFIG_LOCK_STAT > lock->cpu = raw_smp_processor_id(); > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-15 22:32 ` David Rientjes 2011-10-16 5:09 ` Sergey Senozhatsky @ 2011-10-20 18:39 ` Borislav Petkov 2011-10-20 18:53 ` Sergey Senozhatsky 1 sibling, 1 reply; 26+ messages in thread From: Borislav Petkov @ 2011-10-20 18:39 UTC (permalink / raw) To: David Rientjes Cc: Sergey Senozhatsky, Tejun Heo, Tejun Heo, Peter Zijlstra, Ingo Molnar, linux-kernel, Andrew Morton On Sat, Oct 15, 2011 at 03:32:32PM -0700, David Rientjes wrote: > Could you try to revert f59de8992aa6 ("lockdep: Clear whole lockdep_map on > initialization") with this patch and see if it helps? Thanks. > --- > diff --git a/kernel/lockdep.c b/kernel/lockdep.c > --- a/kernel/lockdep.c > +++ b/kernel/lockdep.c > @@ -2874,7 +2874,10 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this, > void lockdep_init_map(struct lockdep_map *lock, const char *name, > struct lock_class_key *key, int subclass) > { > - memset(lock, 0, sizeof(*lock)); > + int i; > + > + for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++) > + lock->class_cache[i] = NULL; > > #ifdef CONFIG_LOCK_STAT > lock->cpu = raw_smp_processor_id(); FWIW, the box has been running here with f59de8992aa6 reverted for a couple of days now and no sign of the warning. I'll keep watching it but it looks ok so far, so David, you could've nailed it. Thanks. -- Regards/Gruss, Boris. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-20 18:39 ` Borislav Petkov @ 2011-10-20 18:53 ` Sergey Senozhatsky 2011-10-20 19:07 ` Sergey Senozhatsky 0 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2011-10-20 18:53 UTC (permalink / raw) To: Borislav Petkov, David Rientjes, Tejun Heo, Tejun Heo, Peter Zijlstra, Ingo Molnar, linux-kernel, Andrew Morton On (10/20/11 20:39), Borislav Petkov wrote: > On Sat, Oct 15, 2011 at 03:32:32PM -0700, David Rientjes wrote: > > Could you try to revert f59de8992aa6 ("lockdep: Clear whole lockdep_map on > > initialization") with this patch and see if it helps? Thanks. > > --- > > diff --git a/kernel/lockdep.c b/kernel/lockdep.c > > --- a/kernel/lockdep.c > > +++ b/kernel/lockdep.c > > @@ -2874,7 +2874,10 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this, > > void lockdep_init_map(struct lockdep_map *lock, const char *name, > > struct lock_class_key *key, int subclass) > > { > > - memset(lock, 0, sizeof(*lock)); > > + int i; > > + > > + for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++) > > + lock->class_cache[i] = NULL; > > > > #ifdef CONFIG_LOCK_STAT > > lock->cpu = raw_smp_processor_id(); > > FWIW, > > the box has been running here with f59de8992aa6 reverted for a couple of > days now and no sign of the warning. I'll keep watching it but it looks > ok so far, so David, you could've nailed it. > Hello, Well, the same with me. My laptop has been running with reverted f59de8992aa6 without any problems so far. Yet, I'm not sure I understand how memset() and loop could produce different results. commit in question (f59de8992aa6dc85e81aadc26b0f69e17809721d) has been merge on Jul 14 15:19:09 2011 +0200, so, Borislav, you probably should have seen it not only on 3.1-rc5, 3.1-rc6,..., but even on 3.0. Sergey ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-20 18:53 ` Sergey Senozhatsky @ 2011-10-20 19:07 ` Sergey Senozhatsky 2011-10-20 21:17 ` David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2011-10-20 19:07 UTC (permalink / raw) To: Borislav Petkov, David Rientjes, Tejun Heo, Tejun Heo, Peter Zijlstra, Ingo Molnar, linux-kernel, Andrew Morton On (10/20/11 21:53), Sergey Senozhatsky wrote: > Date: Thu, 20 Oct 2011 21:53:29 +0300 > From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > To: Borislav Petkov <bp@alien8.de>, David Rientjes <rientjes@google.com>, > Tejun Heo <tj@kernel.org>, Tejun Heo <htejun@gmail.com>, Peter Zijlstra > <peterz@infradead.org>, Ingo Molnar <mingo@elte.hu>, > linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org> > Subject: Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() > User-Agent: Mutt/1.5.21 (2010-09-15) > > On (10/20/11 20:39), Borislav Petkov wrote: > > On Sat, Oct 15, 2011 at 03:32:32PM -0700, David Rientjes wrote: > > > Could you try to revert f59de8992aa6 ("lockdep: Clear whole lockdep_map on > > > initialization") with this patch and see if it helps? Thanks. > > > --- > > > diff --git a/kernel/lockdep.c b/kernel/lockdep.c > > > --- a/kernel/lockdep.c > > > +++ b/kernel/lockdep.c > > > @@ -2874,7 +2874,10 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this, > > > void lockdep_init_map(struct lockdep_map *lock, const char *name, > > > struct lock_class_key *key, int subclass) > > > { > > > - memset(lock, 0, sizeof(*lock)); > > > + int i; > > > + > > > + for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++) > > > + lock->class_cache[i] = NULL; > > > > > > #ifdef CONFIG_LOCK_STAT > > > lock->cpu = raw_smp_processor_id(); > > > > FWIW, > > > > the box has been running here with f59de8992aa6 reverted for a couple of > > days now and no sign of the warning. I'll keep watching it but it looks > > ok so far, so David, you could've nailed it. > > > > Hello, > Well, the same with me. My laptop has been running with reverted f59de8992aa6 without any > problems so far. Yet, I'm not sure I understand how memset() and loop could > produce different results. > Oh, well, nevermind I think I get it. Reverting opens https://bugzilla.kernel.org/show_bug.cgi?id=35532 again. > commit in question (f59de8992aa6dc85e81aadc26b0f69e17809721d) has been merge on > Jul 14 15:19:09 2011 +0200, so, Borislav, you probably should have seen it > not only on 3.1-rc5, 3.1-rc6,..., but even on 3.0. > > Sergey ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-20 19:07 ` Sergey Senozhatsky @ 2011-10-20 21:17 ` David Rientjes 2011-10-20 21:23 ` Tejun Heo 0 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2011-10-20 21:17 UTC (permalink / raw) To: Sergey Senozhatsky, Ingo Molnar, Tejun Heo, Tejun Heo Cc: Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton On Thu, 20 Oct 2011, Sergey Senozhatsky wrote: > > > FWIW, > > > > > > the box has been running here with f59de8992aa6 reverted for a couple of > > > days now and no sign of the warning. I'll keep watching it but it looks > > > ok so far, so David, you could've nailed it. > > > > > > > Hello, > > Well, the same with me. My laptop has been running with reverted f59de8992aa6 without any > > problems so far. Yet, I'm not sure I understand how memset() and loop could > > produce different results. > > > > Oh, well, nevermind I think I get it. > > Reverting opens https://bugzilla.kernel.org/show_bug.cgi?id=35532 again. > I don't know what that is since bugzilla.kernel.org is down :) The problem is that the memset(), in addition to all the other fields in lockdep_map, clears the "name" field, which is what the scheduler uses via lock_set_sublcass() to prevent this lockdep warning. My initial speculation seems to be confirmed since either you or Borislav have been able to reproduce the warning since removing the memset(). Tejun, would you like to revert f59de8992aa6 ("lockdep: Clear whole lockdep_map on initialization") since it fixes this lockdep warning? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-20 21:17 ` David Rientjes @ 2011-10-20 21:23 ` Tejun Heo 2011-10-20 21:31 ` David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: Tejun Heo @ 2011-10-20 21:23 UTC (permalink / raw) To: David Rientjes Cc: Sergey Senozhatsky, Ingo Molnar, Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton Hello, On Thu, Oct 20, 2011 at 02:17:29PM -0700, David Rientjes wrote: > On Thu, 20 Oct 2011, Sergey Senozhatsky wrote: > > > > > FWIW, > > > > > > > > the box has been running here with f59de8992aa6 reverted for a couple of > > > > days now and no sign of the warning. I'll keep watching it but it looks > > > > ok so far, so David, you could've nailed it. > > > > > > > > > > Hello, > > > Well, the same with me. My laptop has been running with reverted f59de8992aa6 without any > > > problems so far. Yet, I'm not sure I understand how memset() and loop could > > > produce different results. > > > > > > > Oh, well, nevermind I think I get it. > > > > Reverting opens https://bugzilla.kernel.org/show_bug.cgi?id=35532 again. > > > > I don't know what that is since bugzilla.kernel.org is down :) The > problem is that the memset(), in addition to all the other fields in > lockdep_map, clears the "name" field, which is what the scheduler uses > via lock_set_sublcass() to prevent this lockdep warning. My initial > speculation seems to be confirmed since either you or Borislav have been > able to reproduce the warning since removing the memset(). > > Tejun, would you like to revert f59de8992aa6 ("lockdep: Clear whole > lockdep_map on initialization") since it fixes this lockdep warning? Hmmm... the issue was that kmemcheck noticed that memory regions in lockdep_map are accessed before being set to any value. I'm feeling dim as usual and don't understand what's going on here. The function looks like the following. void lockdep_init_map(struct lockdep_map *lock, const char *name, struct lock_class_key *key, int subclass) { memset(lock, 0, sizeof(*lock)); #ifdef CONFIG_LOCK_STAT lock->cpu = raw_smp_processor_id(); #endif if (DEBUG_LOCKS_WARN_ON(!name)) { lock->name = "NULL"; return; } lock->name = name; So, according to this thread, the problem is that the memset() clears lock->name field, right? But how can that be a problem? lock->name is always set to either "NULL" or @name. Why would clearing it before setting make any difference? What am I missing? Thanks. -- tejun ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-20 21:23 ` Tejun Heo @ 2011-10-20 21:31 ` David Rientjes 2011-10-20 21:36 ` Tejun Heo 0 siblings, 1 reply; 26+ messages in thread From: David Rientjes @ 2011-10-20 21:31 UTC (permalink / raw) To: Tejun Heo Cc: Sergey Senozhatsky, Ingo Molnar, Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton On Thu, 20 Oct 2011, Tejun Heo wrote: > > Tejun, would you like to revert f59de8992aa6 ("lockdep: Clear whole > > lockdep_map on initialization") since it fixes this lockdep warning? > > Hmmm... the issue was that kmemcheck noticed that memory regions in > lockdep_map are accessed before being set to any value. I'm feeling > dim as usual and don't understand what's going on here. The function > looks like the following. > > > void lockdep_init_map(struct lockdep_map *lock, const char *name, > struct lock_class_key *key, int subclass) > { > memset(lock, 0, sizeof(*lock)); > > #ifdef CONFIG_LOCK_STAT > lock->cpu = raw_smp_processor_id(); > #endif > if (DEBUG_LOCKS_WARN_ON(!name)) { > lock->name = "NULL"; > return; > } > > lock->name = name; > > > So, according to this thread, the problem is that the memset() clears > lock->name field, right? Right, and reverting f59de8992aa6 ("lockdep: Clear whole lockdep_map on initialization") seems to fix the lockdep warning. > But how can that be a problem? lock->name > is always set to either "NULL" or @name. Why would clearing it before > setting make any difference? What am I missing? > The scheduler (in sched_fair and sched_rt) calls lock_set_subclass() which sets the name in double_unlock_balance() to set the name but there's a race between when that is cleared with the memset() and setting of lock->name where lockdep can find them to match. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-20 21:31 ` David Rientjes @ 2011-10-20 21:36 ` Tejun Heo 2011-10-20 23:00 ` Sergey Senozhatsky 0 siblings, 1 reply; 26+ messages in thread From: Tejun Heo @ 2011-10-20 21:36 UTC (permalink / raw) To: David Rientjes Cc: Sergey Senozhatsky, Ingo Molnar, Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton Hello, On Thu, Oct 20, 2011 at 02:31:39PM -0700, David Rientjes wrote: > > So, according to this thread, the problem is that the memset() clears > > lock->name field, right? > > Right, and reverting f59de8992aa6 ("lockdep: Clear whole lockdep_map on > initialization") seems to fix the lockdep warning. > > > But how can that be a problem? lock->name > > is always set to either "NULL" or @name. Why would clearing it before > > setting make any difference? What am I missing? > > > > The scheduler (in sched_fair and sched_rt) calls lock_set_subclass() which > sets the name in double_unlock_balance() to set the name but there's a > race between when that is cleared with the memset() and setting of > lock->name where lockdep can find them to match. Hmmm... so lock_set_subclass() is racing against lockdep_init()? That sounds very fishy and probably needs better fix. Anyways, if someone can't come up with proper solution, please feel free to revert the commit. Thanks. -- tejun ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-20 21:36 ` Tejun Heo @ 2011-10-20 23:00 ` Sergey Senozhatsky 2011-10-21 9:14 ` David Rientjes 0 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2011-10-20 23:00 UTC (permalink / raw) To: Tejun Heo Cc: David Rientjes, Ingo Molnar, Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton On (10/20/11 14:36), Tejun Heo wrote: > Hello, > > On Thu, Oct 20, 2011 at 02:31:39PM -0700, David Rientjes wrote: > > > So, according to this thread, the problem is that the memset() clears > > > lock->name field, right? > > > > Right, and reverting f59de8992aa6 ("lockdep: Clear whole lockdep_map on > > initialization") seems to fix the lockdep warning. > > > > > But how can that be a problem? lock->name > > > is always set to either "NULL" or @name. Why would clearing it before > > > setting make any difference? What am I missing? > > > > > > > The scheduler (in sched_fair and sched_rt) calls lock_set_subclass() which > > sets the name in double_unlock_balance() to set the name but there's a > > race between when that is cleared with the memset() and setting of > > lock->name where lockdep can find them to match. > > Hmmm... so lock_set_subclass() is racing against lockdep_init()? That > sounds very fishy and probably needs better fix. Anyways, if someone > can't come up with proper solution, please feel free to revert the > commit. > I thought I've started understand this, but it was wrong feeling. The error indeed is that class name and lock name are mismatch 689 if (class->key == key) { 690 WARN_ON_ONCE(class->name != lock->name); 691 return class; 692 } And the problem as far as I understand only shows up when active_load_balance_cpu_stop() gets called on rq with active_balance. double_unlock_balance() is called with busiest_rq spin lock held and I don't see who calls lockdep_init_map() on busiest_rq somewhere around. work_struct has its own lockdep_map touched after __queue_work(cpu, wq, work). I'm not sure that reverting is the best option we have, since it's not fixing the possible race condition it's just mask it. I'm not very lucky at reproducing issue, in fact I had only one trace so far. [10172.218213] ------------[ cut here ]------------ [10172.218233] WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() [10172.218346] [<ffffffff8103e7c8>] warn_slowpath_common+0x7e/0x96 [10172.218353] [<ffffffff8103e7f5>] warn_slowpath_null+0x15/0x17 [10172.218361] [<ffffffff8106fee5>] __lock_acquire+0x168/0x164b [10172.218370] [<ffffffff81034645>] ? find_busiest_group+0x7b6/0x941 [10172.218381] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 [10172.218389] [<ffffffff8107197e>] lock_acquire+0x138/0x1ac [10172.218397] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 [10172.218404] [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52 [10172.218414] [<ffffffff8148fb49>] _raw_spin_lock_nested+0x3a/0x49 [10172.218421] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 [10172.218428] [<ffffffff8148fabe>] ? _raw_spin_lock+0x3e/0x45 [10172.218435] [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52 [10172.218442] [<ffffffff8102a5e3>] double_rq_lock+0x4d/0x52 [10172.218449] [<ffffffff810349cc>] load_balance+0x1fc/0x769 [10172.218458] [<ffffffff810075c5>] ? native_sched_clock+0x38/0x65 [10172.218466] [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d [10172.218474] [<ffffffff8148caf5>] __schedule+0x3d3/0xa2d [10172.218480] [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d [10172.218490] [<ffffffff8104db06>] ? add_timer_on+0xd/0x196 [10172.218497] [<ffffffff8148fc02>] ? _raw_spin_lock_irq+0x4a/0x51 [10172.218505] [<ffffffff8105907b>] ? process_one_work+0x3ed/0x54c [10172.218512] [<ffffffff81059126>] ? process_one_work+0x498/0x54c [10172.218518] [<ffffffff81058e1b>] ? process_one_work+0x18d/0x54c [10172.218526] [<ffffffff814902d0>] ? _raw_spin_unlock_irq+0x28/0x56 [10172.218533] [<ffffffff81033950>] ? get_parent_ip+0xe/0x3e [10172.218540] [<ffffffff8148d26e>] schedule+0x55/0x57 [10172.218547] [<ffffffff8105970f>] worker_thread+0x217/0x21c [10172.218554] [<ffffffff810594f8>] ? manage_workers.isra.21+0x16c/0x16c [10172.218564] [<ffffffff8105d4de>] kthread+0x9a/0xa2 [10172.218573] [<ffffffff81497984>] kernel_thread_helper+0x4/0x10 [10172.218580] [<ffffffff8102d6d2>] ? finish_task_switch+0x76/0xf3 [10172.218587] [<ffffffff81490778>] ? retint_restore_args+0x13/0x13 [10172.218595] [<ffffffff8105d444>] ? __init_kthread_worker+0x53/0x53 [10172.218602] [<ffffffff81497980>] ? gs_change+0x13/0x13 [10172.218607] ---[ end trace 9d11d6b5e4b96730 ]--- Sergey ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-20 23:00 ` Sergey Senozhatsky @ 2011-10-21 9:14 ` David Rientjes 2011-10-21 9:26 ` Sergey Senozhatsky 2011-10-21 9:45 ` Yong Zhang 0 siblings, 2 replies; 26+ messages in thread From: David Rientjes @ 2011-10-21 9:14 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Tejun Heo, Ingo Molnar, Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton On Fri, 21 Oct 2011, Sergey Senozhatsky wrote: > I thought I've started understand this, but it was wrong feeling. > > The error indeed is that class name and lock name are mismatch > > 689 if (class->key == key) { > 690 WARN_ON_ONCE(class->name != lock->name); > 691 return class; > 692 } > > And the problem as far as I understand only shows up when active_load_balance_cpu_stop() gets > called on rq with active_balance. > > double_unlock_balance() is called with busiest_rq spin lock held and I don't see who > calls lockdep_init_map() on busiest_rq somewhere around. work_struct has its > own lockdep_map touched after __queue_work(cpu, wq, work). > > I'm not sure that reverting is the best option we have, since it's not fixing > the possible race condition it's just mask it. > How does it mask the race condition? Before the memset(), the ->name field was never _cleared_ in lockdep_init_map() like it is now, it was only stored. > I'm not very lucky at reproducing issue, in fact I had only one trace so far. > > [10172.218213] ------------[ cut here ]------------ > [10172.218233] WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() > [10172.218346] [<ffffffff8103e7c8>] warn_slowpath_common+0x7e/0x96 > [10172.218353] [<ffffffff8103e7f5>] warn_slowpath_null+0x15/0x17 > [10172.218361] [<ffffffff8106fee5>] __lock_acquire+0x168/0x164b > [10172.218370] [<ffffffff81034645>] ? find_busiest_group+0x7b6/0x941 > [10172.218381] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 > [10172.218389] [<ffffffff8107197e>] lock_acquire+0x138/0x1ac > [10172.218397] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 > [10172.218404] [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52 > [10172.218414] [<ffffffff8148fb49>] _raw_spin_lock_nested+0x3a/0x49 > [10172.218421] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 > [10172.218428] [<ffffffff8148fabe>] ? _raw_spin_lock+0x3e/0x45 > [10172.218435] [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52 > [10172.218442] [<ffffffff8102a5e3>] double_rq_lock+0x4d/0x52 > [10172.218449] [<ffffffff810349cc>] load_balance+0x1fc/0x769 > [10172.218458] [<ffffffff810075c5>] ? native_sched_clock+0x38/0x65 > [10172.218466] [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d > [10172.218474] [<ffffffff8148caf5>] __schedule+0x3d3/0xa2d > [10172.218480] [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d > [10172.218490] [<ffffffff8104db06>] ? add_timer_on+0xd/0x196 > [10172.218497] [<ffffffff8148fc02>] ? _raw_spin_lock_irq+0x4a/0x51 > [10172.218505] [<ffffffff8105907b>] ? process_one_work+0x3ed/0x54c > [10172.218512] [<ffffffff81059126>] ? process_one_work+0x498/0x54c > [10172.218518] [<ffffffff81058e1b>] ? process_one_work+0x18d/0x54c > [10172.218526] [<ffffffff814902d0>] ? _raw_spin_unlock_irq+0x28/0x56 > [10172.218533] [<ffffffff81033950>] ? get_parent_ip+0xe/0x3e > [10172.218540] [<ffffffff8148d26e>] schedule+0x55/0x57 > [10172.218547] [<ffffffff8105970f>] worker_thread+0x217/0x21c > [10172.218554] [<ffffffff810594f8>] ? manage_workers.isra.21+0x16c/0x16c > [10172.218564] [<ffffffff8105d4de>] kthread+0x9a/0xa2 > [10172.218573] [<ffffffff81497984>] kernel_thread_helper+0x4/0x10 > [10172.218580] [<ffffffff8102d6d2>] ? finish_task_switch+0x76/0xf3 > [10172.218587] [<ffffffff81490778>] ? retint_restore_args+0x13/0x13 > [10172.218595] [<ffffffff8105d444>] ? __init_kthread_worker+0x53/0x53 > [10172.218602] [<ffffffff81497980>] ? gs_change+0x13/0x13 > [10172.218607] ---[ end trace 9d11d6b5e4b96730 ]--- > This is with the revert? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-21 9:14 ` David Rientjes @ 2011-10-21 9:26 ` Sergey Senozhatsky 2011-10-21 9:45 ` Yong Zhang 1 sibling, 0 replies; 26+ messages in thread From: Sergey Senozhatsky @ 2011-10-21 9:26 UTC (permalink / raw) To: David Rientjes Cc: Tejun Heo, Ingo Molnar, Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton On (10/21/11 02:14), David Rientjes wrote: > > I thought I've started understand this, but it was wrong feeling. > > > > The error indeed is that class name and lock name are mismatch > > > > 689 if (class->key == key) { > > 690 WARN_ON_ONCE(class->name != lock->name); > > 691 return class; > > 692 } > > > > And the problem as far as I understand only shows up when active_load_balance_cpu_stop() gets > > called on rq with active_balance. > > > > double_unlock_balance() is called with busiest_rq spin lock held and I don't see who > > calls lockdep_init_map() on busiest_rq somewhere around. work_struct has its > > own lockdep_map touched after __queue_work(cpu, wq, work). > > > > I'm not sure that reverting is the best option we have, since it's not fixing > > the possible race condition it's just mask it. > > > > How does it mask the race condition? Before the memset(), the ->name > field was never _cleared_ in lockdep_init_map() like it is now, it was > only stored. > Well, if we have race condition between `reader' and `writer', then it's our luck that we only hit it with ->name modification. It could be `->cpu = raw_smp_processor_id' or while iterating thr' `class_cache' to NULL it. Current implementation may only race with `->name' but in theory we have the whole bunch of opportunities. Of course I may be wrong. > > I'm not very lucky at reproducing issue, in fact I had only one trace so far. > > > > [10172.218213] ------------[ cut here ]------------ > > [10172.218233] WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() > > [10172.218346] [<ffffffff8103e7c8>] warn_slowpath_common+0x7e/0x96 > > [10172.218353] [<ffffffff8103e7f5>] warn_slowpath_null+0x15/0x17 > > [10172.218361] [<ffffffff8106fee5>] __lock_acquire+0x168/0x164b > > [10172.218370] [<ffffffff81034645>] ? find_busiest_group+0x7b6/0x941 > > [10172.218381] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 > > [10172.218389] [<ffffffff8107197e>] lock_acquire+0x138/0x1ac > > [10172.218397] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 > > [10172.218404] [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52 > > [10172.218414] [<ffffffff8148fb49>] _raw_spin_lock_nested+0x3a/0x49 > > [10172.218421] [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52 > > [10172.218428] [<ffffffff8148fabe>] ? _raw_spin_lock+0x3e/0x45 > > [10172.218435] [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52 > > [10172.218442] [<ffffffff8102a5e3>] double_rq_lock+0x4d/0x52 > > [10172.218449] [<ffffffff810349cc>] load_balance+0x1fc/0x769 > > [10172.218458] [<ffffffff810075c5>] ? native_sched_clock+0x38/0x65 > > [10172.218466] [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d > > [10172.218474] [<ffffffff8148caf5>] __schedule+0x3d3/0xa2d > > [10172.218480] [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d > > [10172.218490] [<ffffffff8104db06>] ? add_timer_on+0xd/0x196 > > [10172.218497] [<ffffffff8148fc02>] ? _raw_spin_lock_irq+0x4a/0x51 > > [10172.218505] [<ffffffff8105907b>] ? process_one_work+0x3ed/0x54c > > [10172.218512] [<ffffffff81059126>] ? process_one_work+0x498/0x54c > > [10172.218518] [<ffffffff81058e1b>] ? process_one_work+0x18d/0x54c > > [10172.218526] [<ffffffff814902d0>] ? _raw_spin_unlock_irq+0x28/0x56 > > [10172.218533] [<ffffffff81033950>] ? get_parent_ip+0xe/0x3e > > [10172.218540] [<ffffffff8148d26e>] schedule+0x55/0x57 > > [10172.218547] [<ffffffff8105970f>] worker_thread+0x217/0x21c > > [10172.218554] [<ffffffff810594f8>] ? manage_workers.isra.21+0x16c/0x16c > > [10172.218564] [<ffffffff8105d4de>] kthread+0x9a/0xa2 > > [10172.218573] [<ffffffff81497984>] kernel_thread_helper+0x4/0x10 > > [10172.218580] [<ffffffff8102d6d2>] ? finish_task_switch+0x76/0xf3 > > [10172.218587] [<ffffffff81490778>] ? retint_restore_args+0x13/0x13 > > [10172.218595] [<ffffffff8105d444>] ? __init_kthread_worker+0x53/0x53 > > [10172.218602] [<ffffffff81497980>] ? gs_change+0x13/0x13 > > [10172.218607] ---[ end trace 9d11d6b5e4b96730 ]--- > > > > This is with the revert? > Nope, sorry for being unclear, this is the only trace I got. Sergey ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-21 9:14 ` David Rientjes 2011-10-21 9:26 ` Sergey Senozhatsky @ 2011-10-21 9:45 ` Yong Zhang 2011-11-03 7:17 ` Sergey Senozhatsky 1 sibling, 1 reply; 26+ messages in thread From: Yong Zhang @ 2011-10-21 9:45 UTC (permalink / raw) To: David Rientjes Cc: Sergey Senozhatsky, Tejun Heo, Ingo Molnar, Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton On Fri, Oct 21, 2011 at 02:14:34AM -0700, David Rientjes wrote: > How does it mask the race condition? Before the memset(), the ->name > field was never _cleared_ in lockdep_init_map() like it is now, it was > only stored. A typcal race condition will like this: CPU A CPU B lock_set_subclass(lockA); lock_set_class(lockA); lockdep_init_map(lockA); /* lockA->name is cleared */ memset(lockA); __lock_acquire(lockA); /* lockA->class_cache[] is cleared */ register_lock_class(lockA); look_up_lock_class(lockA); WARN_ON_ONCE(class->name != lock->name); lock->name = name; And a untested patch is below: BTW, now the patch could cure (I guess) the very issue reported in this thread. But it don't cover the case which change the key and the relevant lock_class has existed, I don't think out a way how to fix it yet :) But the fact is we have no such caller yet, the only call site of lock_set_subclass() is double_unlock_balance(). Thanks, Yong --- From: Yong Zhang <yong.zhang0@gmail.com> Subject: [PATCH] lockdep: On-demand initialization for lock_set_class() Since commit f59de89 [lockdep: Clear whole lockdep_map on initialization], lockdep_init_map() will clear all the struct. But it will break lock_set_class()/lock_set_subclass(). A typical race condition is like below: CPU A CPU B lock_set_subclass(lockA); lock_set_class(lockA); lockdep_init_map(lockA); /* lockA->name is cleared */ memset(lockA); __lock_acquire(lockA); /* lockA->class_cache[] is cleared */ register_lock_class(lockA); look_up_lock_class(lockA); WARN_ON_ONCE(class->name != lock->name); lock->name = name; Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> --- kernel/lockdep.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/kernel/lockdep.c b/kernel/lockdep.c index 91d67ce..bc7dd1e 100644 --- a/kernel/lockdep.c +++ b/kernel/lockdep.c @@ -3160,7 +3160,10 @@ __lock_set_class(struct lockdep_map *lock, const char *name, return print_unlock_inbalance_bug(curr, lock, ip); found_it: - lockdep_init_map(lock, name, key, 0); + /* only changing lock->name make no sense */ + WARN_ON(lock->key == key && lock->name != name); + if (lock->key != key) + lockdep_init_map(lock, name, key, 0); class = register_lock_class(lock, subclass, 0); hlock->class_idx = class - lock_classes + 1; -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-10-21 9:45 ` Yong Zhang @ 2011-11-03 7:17 ` Sergey Senozhatsky 2011-11-03 7:27 ` Yong Zhang 0 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2011-11-03 7:17 UTC (permalink / raw) To: Yong Zhang Cc: David Rientjes, Tejun Heo, Ingo Molnar, Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton On (10/21/11 17:45), Yong Zhang wrote: > On Fri, Oct 21, 2011 at 02:14:34AM -0700, David Rientjes wrote: > > How does it mask the race condition? Before the memset(), the ->name > > field was never _cleared_ in lockdep_init_map() like it is now, it was > > only stored. > > A typcal race condition will like this: > > CPU A CPU B > lock_set_subclass(lockA); > lock_set_class(lockA); > lockdep_init_map(lockA); > /* lockA->name is cleared */ > memset(lockA); > __lock_acquire(lockA); > /* lockA->class_cache[] is cleared */ > register_lock_class(lockA); > look_up_lock_class(lockA); > WARN_ON_ONCE(class->name != > lock->name); > > lock->name = name; > > And a untested patch is below: > BTW, now the patch could cure (I guess) the very issue reported > in this thread. > But it don't cover the case which change the key and the relevant > lock_class has existed, I don't think out a way how to fix it yet :) > But the fact is we have no such caller yet, the only call site of > lock_set_subclass() is double_unlock_balance(). > Hello, Any news on this patch? Do you like it or hate it? With recent kernels I'm able to hit this problem more often (several time a day) so if any testing is required I'm willing to help. Sergey > > --- > From: Yong Zhang <yong.zhang0@gmail.com> > Subject: [PATCH] lockdep: On-demand initialization for lock_set_class() > > Since commit f59de89 [lockdep: Clear whole lockdep_map on initialization], > lockdep_init_map() will clear all the struct. But it will break > lock_set_class()/lock_set_subclass(). A typical race condition > is like below: > > CPU A CPU B > lock_set_subclass(lockA); > lock_set_class(lockA); > lockdep_init_map(lockA); > /* lockA->name is cleared */ > memset(lockA); > __lock_acquire(lockA); > /* lockA->class_cache[] is cleared */ > register_lock_class(lockA); > look_up_lock_class(lockA); > WARN_ON_ONCE(class->name != > lock->name); > > lock->name = name; > > Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Reported-by: Borislav Petkov <bp@alien8.de> > Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> > Cc: Tejun Heo <tj@kernel.org> > Cc: David Rientjes <rientjes@google.com> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: Peter Zijlstra <peterz@infradead.org> > --- > kernel/lockdep.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/kernel/lockdep.c b/kernel/lockdep.c > index 91d67ce..bc7dd1e 100644 > --- a/kernel/lockdep.c > +++ b/kernel/lockdep.c > @@ -3160,7 +3160,10 @@ __lock_set_class(struct lockdep_map *lock, const char *name, > return print_unlock_inbalance_bug(curr, lock, ip); > > found_it: > - lockdep_init_map(lock, name, key, 0); > + /* only changing lock->name make no sense */ > + WARN_ON(lock->key == key && lock->name != name); > + if (lock->key != key) > + lockdep_init_map(lock, name, key, 0); > class = register_lock_class(lock, subclass, 0); > hlock->class_idx = class - lock_classes + 1; > > -- > 1.7.5.4 > > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-11-03 7:17 ` Sergey Senozhatsky @ 2011-11-03 7:27 ` Yong Zhang 2011-11-03 7:45 ` Sergey Senozhatsky 0 siblings, 1 reply; 26+ messages in thread From: Yong Zhang @ 2011-11-03 7:27 UTC (permalink / raw) To: Sergey Senozhatsky Cc: David Rientjes, Tejun Heo, Ingo Molnar, Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton On Thu, Nov 03, 2011 at 10:17:36AM +0300, Sergey Senozhatsky wrote: > On (10/21/11 17:45), Yong Zhang wrote: > > On Fri, Oct 21, 2011 at 02:14:34AM -0700, David Rientjes wrote: > > > How does it mask the race condition? Before the memset(), the ->name > > > field was never _cleared_ in lockdep_init_map() like it is now, it was > > > only stored. > > > > A typcal race condition will like this: > > > > CPU A CPU B > > lock_set_subclass(lockA); > > lock_set_class(lockA); > > lockdep_init_map(lockA); > > /* lockA->name is cleared */ > > memset(lockA); > > __lock_acquire(lockA); > > /* lockA->class_cache[] is cleared */ > > register_lock_class(lockA); > > look_up_lock_class(lockA); > > WARN_ON_ONCE(class->name != > > lock->name); > > > > lock->name = name; > > > > And a untested patch is below: > > BTW, now the patch could cure (I guess) the very issue reported > > in this thread. > > But it don't cover the case which change the key and the relevant > > lock_class has existed, I don't think out a way how to fix it yet :) > > But the fact is we have no such caller yet, the only call site of > > lock_set_subclass() is double_unlock_balance(). > > > > Hello, > Any news on this patch? Do you like it or hate it? With recent kernels > I'm able to hit this problem more often (several time a day) so if any > testing is required I'm willing to help. Did you have tried it? Though I don't find time to polish it yet but I think will smooth your concern. Thanks, Yong ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-11-03 7:27 ` Yong Zhang @ 2011-11-03 7:45 ` Sergey Senozhatsky 2011-11-03 7:53 ` Yong Zhang 0 siblings, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2011-11-03 7:45 UTC (permalink / raw) To: Yong Zhang Cc: David Rientjes, Tejun Heo, Ingo Molnar, Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton On (11/03/11 15:27), Yong Zhang wrote: > > > A typcal race condition will like this: > > > > > > CPU A CPU B > > > lock_set_subclass(lockA); > > > lock_set_class(lockA); > > > lockdep_init_map(lockA); > > > /* lockA->name is cleared */ > > > memset(lockA); > > > __lock_acquire(lockA); > > > /* lockA->class_cache[] is cleared */ > > > register_lock_class(lockA); > > > look_up_lock_class(lockA); > > > WARN_ON_ONCE(class->name != > > > lock->name); > > > > > > lock->name = name; > > > > > > And a untested patch is below: > > > BTW, now the patch could cure (I guess) the very issue reported > > > in this thread. > > > But it don't cover the case which change the key and the relevant > > > lock_class has existed, I don't think out a way how to fix it yet :) > > > But the fact is we have no such caller yet, the only call site of > > > lock_set_subclass() is double_unlock_balance(). > > > > > > > Hello, > > Any news on this patch? Do you like it or hate it? With recent kernels > > I'm able to hit this problem more often (several time a day) so if any > > testing is required I'm willing to help. > > Did you have tried it? Though I don't find time to polish it yet but > I think will smooth your concern. > I'm compiling the kernel with you patch right now. The whole point was just for case if someone has different approach or whatsoever. Sergey ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-11-03 7:45 ` Sergey Senozhatsky @ 2011-11-03 7:53 ` Yong Zhang 2011-11-04 9:25 ` Borislav Petkov 0 siblings, 1 reply; 26+ messages in thread From: Yong Zhang @ 2011-11-03 7:53 UTC (permalink / raw) To: Sergey Senozhatsky Cc: David Rientjes, Tejun Heo, Ingo Molnar, Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton On Thu, Nov 03, 2011 at 10:45:06AM +0300, Sergey Senozhatsky wrote: > On (11/03/11 15:27), Yong Zhang wrote: > > Did you have tried it? Though I don't find time to polish it yet but > > I think will smooth your concern. > > > > I'm compiling the kernel with you patch right now. > The whole point was just for > case if someone has different approach or whatsoever. Understood. If someone can come up with a simple patch which could cover the case I mentioned before, that would be great. /me goes to poke at it. Thanks, Yong ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-11-03 7:53 ` Yong Zhang @ 2011-11-04 9:25 ` Borislav Petkov 2011-11-04 9:31 ` Sergey Senozhatsky 2011-11-04 9:34 ` Yong Zhang 0 siblings, 2 replies; 26+ messages in thread From: Borislav Petkov @ 2011-11-04 9:25 UTC (permalink / raw) To: Yong Zhang Cc: Sergey Senozhatsky, David Rientjes, Tejun Heo, Ingo Molnar, Borislav Petkov, Peter Zijlstra, linux-kernel, Andrew Morton On Thu, Nov 03, 2011 at 03:53:54PM +0800, Yong Zhang wrote: > On Thu, Nov 03, 2011 at 10:45:06AM +0300, Sergey Senozhatsky wrote: > > On (11/03/11 15:27), Yong Zhang wrote: > > > Did you have tried it? Though I don't find time to polish it yet but > > > I think will smooth your concern. > > > > > > > I'm compiling the kernel with you patch right now. > > The whole point was just for > > case if someone has different approach or whatsoever. > > Understood. If someone can come up with a simple patch which could > cover the case I mentioned before, that would be great. > /me goes to poke at it. I dunno whether this is related but I get the following on 3.1: [ 5499.537074] INFO: trying to register non-static key. [ 5499.537080] the code is fine but needs lockdep annotation. [ 5499.537083] turning off the locking correctness validator. [ 5499.537088] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0 #1 [ 5499.537091] Call Trace: [ 5499.537094] <IRQ> [<ffffffff8107beed>] __lock_acquire+0x165d/0x1e30 [ 5499.537109] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 [ 5499.537115] [<ffffffff8107ccd3>] lock_acquire+0x93/0x160 [ 5499.537120] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 [ 5499.537126] [<ffffffff814d9866>] _raw_spin_lock+0x36/0x50 [ 5499.537130] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 [ 5499.537135] [<ffffffff810321fc>] double_rq_lock+0x2c/0x80 [ 5499.537140] [<ffffffff81039195>] load_balance+0x215/0x6c0 [ 5499.537146] [<ffffffff81039640>] ? load_balance+0x6c0/0x6c0 [ 5499.537151] [<ffffffff810396fd>] rebalance_domains+0xbd/0x1d0 [ 5499.537155] [<ffffffff81039640>] ? load_balance+0x6c0/0x6c0 [ 5499.537161] [<ffffffff810398ec>] run_rebalance_domains+0xdc/0x130 [ 5499.537166] [<ffffffff81048dcd>] __do_softirq+0xbd/0x290 [ 5499.537173] [<ffffffff814dc42c>] call_softirq+0x1c/0x30 [ 5499.537178] [<ffffffff81003eb5>] do_softirq+0x85/0xc0 [ 5499.537183] [<ffffffff810492ce>] irq_exit+0x9e/0xc0 [ 5499.537189] [<ffffffff8101ca9f>] smp_call_function_single_interrupt+0x2f/0x40 [ 5499.537195] [<ffffffff814dbeb0>] call_function_single_interrupt+0x70/0x80 [ 5499.537199] <EOI> [<ffffffff810096e6>] ? native_sched_clock+0x26/0x70 [ 5499.537212] [<ffffffffa0038e1a>] ? acpi_idle_enter_simple+0xee/0x11f [processor] [ 5499.537221] [<ffffffffa0038e15>] ? acpi_idle_enter_simple+0xe9/0x11f [processor] [ 5499.537227] [<ffffffff813f8b1d>] cpuidle_idle_call+0xdd/0x350 [ 5499.537233] [<ffffffff8100081f>] cpu_idle+0x6f/0xd0 [ 5499.537238] [<ffffffff814cc665>] start_secondary+0x1ae/0x1b3 -- Regards/Gruss, Boris. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-11-04 9:25 ` Borislav Petkov @ 2011-11-04 9:31 ` Sergey Senozhatsky 2011-11-07 4:54 ` Yong Zhang 2011-11-04 9:34 ` Yong Zhang 1 sibling, 1 reply; 26+ messages in thread From: Sergey Senozhatsky @ 2011-11-04 9:31 UTC (permalink / raw) To: Borislav Petkov Cc: Yong Zhang, David Rientjes, Tejun Heo, Ingo Molnar, Peter Zijlstra, linux-kernel, Andrew Morton On (11/04/11 10:25), Borislav Petkov wrote: > > Understood. If someone can come up with a simple patch which could > > cover the case I mentioned before, that would be great. > > /me goes to poke at it. > > I dunno whether this is related but I get the following on 3.1: > I think this is different problem. Failed check that lockdep key is marked as `static'. Sergey > [ 5499.537074] INFO: trying to register non-static key. > [ 5499.537080] the code is fine but needs lockdep annotation. > [ 5499.537083] turning off the locking correctness validator. > [ 5499.537088] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0 #1 > [ 5499.537091] Call Trace: > [ 5499.537094] <IRQ> [<ffffffff8107beed>] __lock_acquire+0x165d/0x1e30 > [ 5499.537109] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 > [ 5499.537115] [<ffffffff8107ccd3>] lock_acquire+0x93/0x160 > [ 5499.537120] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 > [ 5499.537126] [<ffffffff814d9866>] _raw_spin_lock+0x36/0x50 > [ 5499.537130] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 > [ 5499.537135] [<ffffffff810321fc>] double_rq_lock+0x2c/0x80 > [ 5499.537140] [<ffffffff81039195>] load_balance+0x215/0x6c0 > [ 5499.537146] [<ffffffff81039640>] ? load_balance+0x6c0/0x6c0 > [ 5499.537151] [<ffffffff810396fd>] rebalance_domains+0xbd/0x1d0 > [ 5499.537155] [<ffffffff81039640>] ? load_balance+0x6c0/0x6c0 > [ 5499.537161] [<ffffffff810398ec>] run_rebalance_domains+0xdc/0x130 > [ 5499.537166] [<ffffffff81048dcd>] __do_softirq+0xbd/0x290 > [ 5499.537173] [<ffffffff814dc42c>] call_softirq+0x1c/0x30 > [ 5499.537178] [<ffffffff81003eb5>] do_softirq+0x85/0xc0 > [ 5499.537183] [<ffffffff810492ce>] irq_exit+0x9e/0xc0 > [ 5499.537189] [<ffffffff8101ca9f>] smp_call_function_single_interrupt+0x2f/0x40 > [ 5499.537195] [<ffffffff814dbeb0>] call_function_single_interrupt+0x70/0x80 > [ 5499.537199] <EOI> [<ffffffff810096e6>] ? native_sched_clock+0x26/0x70 > [ 5499.537212] [<ffffffffa0038e1a>] ? acpi_idle_enter_simple+0xee/0x11f [processor] > [ 5499.537221] [<ffffffffa0038e15>] ? acpi_idle_enter_simple+0xe9/0x11f [processor] > [ 5499.537227] [<ffffffff813f8b1d>] cpuidle_idle_call+0xdd/0x350 > [ 5499.537233] [<ffffffff8100081f>] cpu_idle+0x6f/0xd0 > [ 5499.537238] [<ffffffff814cc665>] start_secondary+0x1ae/0x1b3 > > -- > Regards/Gruss, > Boris. > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-11-04 9:31 ` Sergey Senozhatsky @ 2011-11-07 4:54 ` Yong Zhang 2011-11-07 8:43 ` Sergey Senozhatsky 0 siblings, 1 reply; 26+ messages in thread From: Yong Zhang @ 2011-11-07 4:54 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Borislav Petkov, David Rientjes, Tejun Heo, Ingo Molnar, Peter Zijlstra, linux-kernel, Andrew Morton On Fri, Nov 04, 2011 at 12:31:24PM +0300, Sergey Senozhatsky wrote: > On (11/04/11 10:25), Borislav Petkov wrote: > > > Understood. If someone can come up with a simple patch which could > > > cover the case I mentioned before, that would be great. > > > /me goes to poke at it. > > > > I dunno whether this is related but I get the following on 3.1: > > > > I think this is different problem. Failed check that lockdep key is marked as `static'. Actually the lockdep_init_map() in __lock_set_class could lead to more problem, such as: certain rq->lock could have different 'key' with what we give them in sched_init() because rq is defined staticly. Given that, we could have another typical race: CPU A CPU B lock_set_subclass(lockA); lock_set_class(lockA); /* lockA->class_cache[] is not set */ register_lock_class(lockA); look_up_lock_class(lockA); /* retrun NULL */ lockdep_init_map(lockA); /* lockA->name is cleared */ memset(lockA); if (!static_obj(lock->key)) /* we get warning here */ lock->name = name; So memset() in lockdep_init_map() is still the culprit IMHO. Thanks, Yong > > Sergey > > > [ 5499.537074] INFO: trying to register non-static key. > > [ 5499.537080] the code is fine but needs lockdep annotation. > > [ 5499.537083] turning off the locking correctness validator. > > [ 5499.537088] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0 #1 > > [ 5499.537091] Call Trace: > > [ 5499.537094] <IRQ> [<ffffffff8107beed>] __lock_acquire+0x165d/0x1e30 > > [ 5499.537109] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 > > [ 5499.537115] [<ffffffff8107ccd3>] lock_acquire+0x93/0x160 > > [ 5499.537120] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 > > [ 5499.537126] [<ffffffff814d9866>] _raw_spin_lock+0x36/0x50 > > [ 5499.537130] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 > > [ 5499.537135] [<ffffffff810321fc>] double_rq_lock+0x2c/0x80 > > [ 5499.537140] [<ffffffff81039195>] load_balance+0x215/0x6c0 > > [ 5499.537146] [<ffffffff81039640>] ? load_balance+0x6c0/0x6c0 > > [ 5499.537151] [<ffffffff810396fd>] rebalance_domains+0xbd/0x1d0 > > [ 5499.537155] [<ffffffff81039640>] ? load_balance+0x6c0/0x6c0 > > [ 5499.537161] [<ffffffff810398ec>] run_rebalance_domains+0xdc/0x130 > > [ 5499.537166] [<ffffffff81048dcd>] __do_softirq+0xbd/0x290 > > [ 5499.537173] [<ffffffff814dc42c>] call_softirq+0x1c/0x30 > > [ 5499.537178] [<ffffffff81003eb5>] do_softirq+0x85/0xc0 > > [ 5499.537183] [<ffffffff810492ce>] irq_exit+0x9e/0xc0 > > [ 5499.537189] [<ffffffff8101ca9f>] smp_call_function_single_interrupt+0x2f/0x40 > > [ 5499.537195] [<ffffffff814dbeb0>] call_function_single_interrupt+0x70/0x80 > > [ 5499.537199] <EOI> [<ffffffff810096e6>] ? native_sched_clock+0x26/0x70 > > [ 5499.537212] [<ffffffffa0038e1a>] ? acpi_idle_enter_simple+0xee/0x11f [processor] > > [ 5499.537221] [<ffffffffa0038e15>] ? acpi_idle_enter_simple+0xe9/0x11f [processor] > > [ 5499.537227] [<ffffffff813f8b1d>] cpuidle_idle_call+0xdd/0x350 > > [ 5499.537233] [<ffffffff8100081f>] cpu_idle+0x6f/0xd0 > > [ 5499.537238] [<ffffffff814cc665>] start_secondary+0x1ae/0x1b3 > > > > -- > > Regards/Gruss, > > Boris. > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Only stand for myself ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-11-07 4:54 ` Yong Zhang @ 2011-11-07 8:43 ` Sergey Senozhatsky 0 siblings, 0 replies; 26+ messages in thread From: Sergey Senozhatsky @ 2011-11-07 8:43 UTC (permalink / raw) To: Yong Zhang Cc: Borislav Petkov, David Rientjes, Tejun Heo, Ingo Molnar, Peter Zijlstra, linux-kernel, Andrew Morton On (11/07/11 12:54), Yong Zhang wrote: > > > > Understood. If someone can come up with a simple patch which could > > > > cover the case I mentioned before, that would be great. > > > > /me goes to poke at it. > > > > > > I dunno whether this is related but I get the following on 3.1: > > > > > > > I think this is different problem. Failed check that lockdep key is marked as `static'. > > Actually the lockdep_init_map() in __lock_set_class could lead to > more problem, such as: certain rq->lock could have different 'key' > with what we give them in sched_init() because rq is defined staticly. > > Given that, we could have another typical race: > > CPU A CPU B > lock_set_subclass(lockA); > lock_set_class(lockA); > /* lockA->class_cache[] is not set */ > register_lock_class(lockA); > look_up_lock_class(lockA); /* retrun NULL */ > lockdep_init_map(lockA); > /* lockA->name is cleared */ > memset(lockA); > if (!static_obj(lock->key)) > /* we get warning here */ > lock->name = name; > > > So memset() in lockdep_init_map() is still the culprit IMHO. > Hm, agreed, that still could be the reason. I guess a little more information may be helpful in some cases. --- Print key address when attempt to register non-static lock key detected. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- diff --git a/kernel/lockdep.c b/kernel/lockdep.c index e69434b..de8a996 100644 --- a/kernel/lockdep.c +++ b/kernel/lockdep.c @@ -729,7 +729,7 @@ register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force) */ if (!static_obj(lock->key)) { debug_locks_off(); - printk("INFO: trying to register non-static key.\n"); + printk("INFO: trying to register non-static key at address %p.\n", lock->key); printk("the code is fine but needs lockdep annotation.\n"); printk("turning off the locking correctness validator.\n"); dump_stack(); ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-11-04 9:25 ` Borislav Petkov 2011-11-04 9:31 ` Sergey Senozhatsky @ 2011-11-04 9:34 ` Yong Zhang 2011-11-04 9:51 ` Sergey Senozhatsky 1 sibling, 1 reply; 26+ messages in thread From: Yong Zhang @ 2011-11-04 9:34 UTC (permalink / raw) To: Borislav Petkov Cc: Sergey Senozhatsky, David Rientjes, Tejun Heo, Ingo Molnar, Peter Zijlstra, linux-kernel, Andrew Morton On Fri, Nov 04, 2011 at 10:25:20AM +0100, Borislav Petkov wrote: > On Thu, Nov 03, 2011 at 03:53:54PM +0800, Yong Zhang wrote: > > On Thu, Nov 03, 2011 at 10:45:06AM +0300, Sergey Senozhatsky wrote: > > > On (11/03/11 15:27), Yong Zhang wrote: > > > > Did you have tried it? Though I don't find time to polish it yet but > > > > I think will smooth your concern. > > > > > > > > > > I'm compiling the kernel with you patch right now. > > > The whole point was just for > > > case if someone has different approach or whatsoever. > > > > Understood. If someone can come up with a simple patch which could > > cover the case I mentioned before, that would be great. > > /me goes to poke at it. > > I dunno whether this is related but I get the following on 3.1: Maybe, so could you try my patches just sent out? http://marc.info/?l=linux-kernel&m=132039886826672 Thanks, Yong > > [ 5499.537074] INFO: trying to register non-static key. > [ 5499.537080] the code is fine but needs lockdep annotation. > [ 5499.537083] turning off the locking correctness validator. > [ 5499.537088] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0 #1 > [ 5499.537091] Call Trace: > [ 5499.537094] <IRQ> [<ffffffff8107beed>] __lock_acquire+0x165d/0x1e30 > [ 5499.537109] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 > [ 5499.537115] [<ffffffff8107ccd3>] lock_acquire+0x93/0x160 > [ 5499.537120] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 > [ 5499.537126] [<ffffffff814d9866>] _raw_spin_lock+0x36/0x50 > [ 5499.537130] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 > [ 5499.537135] [<ffffffff810321fc>] double_rq_lock+0x2c/0x80 > [ 5499.537140] [<ffffffff81039195>] load_balance+0x215/0x6c0 > [ 5499.537146] [<ffffffff81039640>] ? load_balance+0x6c0/0x6c0 > [ 5499.537151] [<ffffffff810396fd>] rebalance_domains+0xbd/0x1d0 > [ 5499.537155] [<ffffffff81039640>] ? load_balance+0x6c0/0x6c0 > [ 5499.537161] [<ffffffff810398ec>] run_rebalance_domains+0xdc/0x130 > [ 5499.537166] [<ffffffff81048dcd>] __do_softirq+0xbd/0x290 > [ 5499.537173] [<ffffffff814dc42c>] call_softirq+0x1c/0x30 > [ 5499.537178] [<ffffffff81003eb5>] do_softirq+0x85/0xc0 > [ 5499.537183] [<ffffffff810492ce>] irq_exit+0x9e/0xc0 > [ 5499.537189] [<ffffffff8101ca9f>] smp_call_function_single_interrupt+0x2f/0x40 > [ 5499.537195] [<ffffffff814dbeb0>] call_function_single_interrupt+0x70/0x80 > [ 5499.537199] <EOI> [<ffffffff810096e6>] ? native_sched_clock+0x26/0x70 > [ 5499.537212] [<ffffffffa0038e1a>] ? acpi_idle_enter_simple+0xee/0x11f [processor] > [ 5499.537221] [<ffffffffa0038e15>] ? acpi_idle_enter_simple+0xe9/0x11f [processor] > [ 5499.537227] [<ffffffff813f8b1d>] cpuidle_idle_call+0xdd/0x350 > [ 5499.537233] [<ffffffff8100081f>] cpu_idle+0x6f/0xd0 > [ 5499.537238] [<ffffffff814cc665>] start_secondary+0x1ae/0x1b3 > > -- > Regards/Gruss, > Boris. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Only stand for myself ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() 2011-11-04 9:34 ` Yong Zhang @ 2011-11-04 9:51 ` Sergey Senozhatsky 0 siblings, 0 replies; 26+ messages in thread From: Sergey Senozhatsky @ 2011-11-04 9:51 UTC (permalink / raw) To: Yong Zhang Cc: Borislav Petkov, David Rientjes, Tejun Heo, Ingo Molnar, Peter Zijlstra, linux-kernel, Andrew Morton On (11/04/11 17:34), Yong Zhang wrote: > > > > I'm compiling the kernel with you patch right now. > > > > The whole point was just for > > > > case if someone has different approach or whatsoever. > > > > > > Understood. If someone can come up with a simple patch which could > > > cover the case I mentioned before, that would be great. > > > /me goes to poke at it. > > > > I dunno whether this is related but I get the following on 3.1: > > Maybe, so could you try my patches just sent out? > http://marc.info/?l=linux-kernel&m=132039886826672 > Sure I'll try you patches. Later today or (most likely) during the upcoming weekend, since now I'm extremly busy at work. Thanks, Sergey > Thanks, > Yong > > > > > [ 5499.537074] INFO: trying to register non-static key. > > [ 5499.537080] the code is fine but needs lockdep annotation. > > [ 5499.537083] turning off the locking correctness validator. > > [ 5499.537088] Pid: 0, comm: kworker/0:1 Not tainted 3.1.0 #1 > > [ 5499.537091] Call Trace: > > [ 5499.537094] <IRQ> [<ffffffff8107beed>] __lock_acquire+0x165d/0x1e30 > > [ 5499.537109] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 > > [ 5499.537115] [<ffffffff8107ccd3>] lock_acquire+0x93/0x160 > > [ 5499.537120] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 > > [ 5499.537126] [<ffffffff814d9866>] _raw_spin_lock+0x36/0x50 > > [ 5499.537130] [<ffffffff810321fc>] ? double_rq_lock+0x2c/0x80 > > [ 5499.537135] [<ffffffff810321fc>] double_rq_lock+0x2c/0x80 > > [ 5499.537140] [<ffffffff81039195>] load_balance+0x215/0x6c0 > > [ 5499.537146] [<ffffffff81039640>] ? load_balance+0x6c0/0x6c0 > > [ 5499.537151] [<ffffffff810396fd>] rebalance_domains+0xbd/0x1d0 > > [ 5499.537155] [<ffffffff81039640>] ? load_balance+0x6c0/0x6c0 > > [ 5499.537161] [<ffffffff810398ec>] run_rebalance_domains+0xdc/0x130 > > [ 5499.537166] [<ffffffff81048dcd>] __do_softirq+0xbd/0x290 > > [ 5499.537173] [<ffffffff814dc42c>] call_softirq+0x1c/0x30 > > [ 5499.537178] [<ffffffff81003eb5>] do_softirq+0x85/0xc0 > > [ 5499.537183] [<ffffffff810492ce>] irq_exit+0x9e/0xc0 > > [ 5499.537189] [<ffffffff8101ca9f>] smp_call_function_single_interrupt+0x2f/0x40 > > [ 5499.537195] [<ffffffff814dbeb0>] call_function_single_interrupt+0x70/0x80 > > [ 5499.537199] <EOI> [<ffffffff810096e6>] ? native_sched_clock+0x26/0x70 > > [ 5499.537212] [<ffffffffa0038e1a>] ? acpi_idle_enter_simple+0xee/0x11f [processor] > > [ 5499.537221] [<ffffffffa0038e15>] ? acpi_idle_enter_simple+0xe9/0x11f [processor] > > [ 5499.537227] [<ffffffff813f8b1d>] cpuidle_idle_call+0xdd/0x350 > > [ 5499.537233] [<ffffffff8100081f>] cpu_idle+0x6f/0xd0 > > [ 5499.537238] [<ffffffff814cc665>] start_secondary+0x1ae/0x1b3 > > > > -- > > Regards/Gruss, > > Boris. > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > -- > Only stand for myself > ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2011-11-07 8:46 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-10-15 20:12 WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() Sergey Senozhatsky 2011-10-15 21:42 ` David Rientjes 2011-10-15 22:23 ` Borislav Petkov 2011-10-15 22:32 ` David Rientjes 2011-10-16 5:09 ` Sergey Senozhatsky 2011-10-20 18:39 ` Borislav Petkov 2011-10-20 18:53 ` Sergey Senozhatsky 2011-10-20 19:07 ` Sergey Senozhatsky 2011-10-20 21:17 ` David Rientjes 2011-10-20 21:23 ` Tejun Heo 2011-10-20 21:31 ` David Rientjes 2011-10-20 21:36 ` Tejun Heo 2011-10-20 23:00 ` Sergey Senozhatsky 2011-10-21 9:14 ` David Rientjes 2011-10-21 9:26 ` Sergey Senozhatsky 2011-10-21 9:45 ` Yong Zhang 2011-11-03 7:17 ` Sergey Senozhatsky 2011-11-03 7:27 ` Yong Zhang 2011-11-03 7:45 ` Sergey Senozhatsky 2011-11-03 7:53 ` Yong Zhang 2011-11-04 9:25 ` Borislav Petkov 2011-11-04 9:31 ` Sergey Senozhatsky 2011-11-07 4:54 ` Yong Zhang 2011-11-07 8:43 ` Sergey Senozhatsky 2011-11-04 9:34 ` Yong Zhang 2011-11-04 9:51 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).