linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Borislav Petkov <bp@alien8.de>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b()
Date: Fri, 21 Oct 2011 12:26:20 +0300	[thread overview]
Message-ID: <20111021092620.GA25694@swordfish> (raw)
In-Reply-To: <alpine.DEB.2.00.1110210212130.12963@chino.kir.corp.google.com>

On (10/21/11 02:14), David Rientjes wrote:
> > I thought I've started understand this, but it was wrong feeling.
> > 
> > The error indeed is that class name and lock name are mismatch
> > 
> >  689                 if (class->key == key) {                                                                                                                                                                                      
> >  690                         WARN_ON_ONCE(class->name != lock->name);                                            
> >  691                         return class;                                                                       
> >  692                 }  
> > 
> > And the problem as far as I understand only shows up when active_load_balance_cpu_stop() gets
> > called on rq with active_balance.
> > 
> > double_unlock_balance() is called with busiest_rq spin lock held and I don't see who
> > calls lockdep_init_map() on busiest_rq somewhere around. work_struct has its
> > own lockdep_map touched after __queue_work(cpu, wq, work).
> > 
> > I'm not sure that reverting is the best option we have, since it's not fixing
> > the possible race condition it's just mask it.
> > 
> 
> How does it mask the race condition?  Before the memset(), the ->name 
> field was never _cleared_ in lockdep_init_map() like it is now, it was 
> only stored.
>

Well, if we have race condition between `reader' and `writer', then it's our luck that we only hit it
with ->name modification. It could be `->cpu = raw_smp_processor_id' or while iterating thr' `class_cache'
to NULL it. Current implementation may only race with `->name' but in theory we have the whole bunch of
opportunities. Of course I may be wrong.

 
> > I'm not very lucky at reproducing issue, in fact I had only one trace so far.
> > 
> > [10172.218213] ------------[ cut here ]------------
> > [10172.218233] WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b()
> > [10172.218346]  [<ffffffff8103e7c8>] warn_slowpath_common+0x7e/0x96
> > [10172.218353]  [<ffffffff8103e7f5>] warn_slowpath_null+0x15/0x17
> > [10172.218361]  [<ffffffff8106fee5>] __lock_acquire+0x168/0x164b
> > [10172.218370]  [<ffffffff81034645>] ? find_busiest_group+0x7b6/0x941
> > [10172.218381]  [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52
> > [10172.218389]  [<ffffffff8107197e>] lock_acquire+0x138/0x1ac
> > [10172.218397]  [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52
> > [10172.218404]  [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52
> > [10172.218414]  [<ffffffff8148fb49>] _raw_spin_lock_nested+0x3a/0x49
> > [10172.218421]  [<ffffffff8102a5e3>] ? double_rq_lock+0x4d/0x52
> > [10172.218428]  [<ffffffff8148fabe>] ? _raw_spin_lock+0x3e/0x45
> > [10172.218435]  [<ffffffff8102a5c4>] ? double_rq_lock+0x2e/0x52
> > [10172.218442]  [<ffffffff8102a5e3>] double_rq_lock+0x4d/0x52
> > [10172.218449]  [<ffffffff810349cc>] load_balance+0x1fc/0x769
> > [10172.218458]  [<ffffffff810075c5>] ? native_sched_clock+0x38/0x65
> > [10172.218466]  [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d
> > [10172.218474]  [<ffffffff8148caf5>] __schedule+0x3d3/0xa2d
> > [10172.218480]  [<ffffffff8148ca17>] ? __schedule+0x2f5/0xa2d
> > [10172.218490]  [<ffffffff8104db06>] ? add_timer_on+0xd/0x196
> > [10172.218497]  [<ffffffff8148fc02>] ? _raw_spin_lock_irq+0x4a/0x51
> > [10172.218505]  [<ffffffff8105907b>] ? process_one_work+0x3ed/0x54c
> > [10172.218512]  [<ffffffff81059126>] ? process_one_work+0x498/0x54c
> > [10172.218518]  [<ffffffff81058e1b>] ? process_one_work+0x18d/0x54c
> > [10172.218526]  [<ffffffff814902d0>] ? _raw_spin_unlock_irq+0x28/0x56
> > [10172.218533]  [<ffffffff81033950>] ? get_parent_ip+0xe/0x3e
> > [10172.218540]  [<ffffffff8148d26e>] schedule+0x55/0x57
> > [10172.218547]  [<ffffffff8105970f>] worker_thread+0x217/0x21c
> > [10172.218554]  [<ffffffff810594f8>] ? manage_workers.isra.21+0x16c/0x16c
> > [10172.218564]  [<ffffffff8105d4de>] kthread+0x9a/0xa2
> > [10172.218573]  [<ffffffff81497984>] kernel_thread_helper+0x4/0x10
> > [10172.218580]  [<ffffffff8102d6d2>] ? finish_task_switch+0x76/0xf3
> > [10172.218587]  [<ffffffff81490778>] ? retint_restore_args+0x13/0x13
> > [10172.218595]  [<ffffffff8105d444>] ? __init_kthread_worker+0x53/0x53
> > [10172.218602]  [<ffffffff81497980>] ? gs_change+0x13/0x13
> > [10172.218607] ---[ end trace 9d11d6b5e4b96730 ]---
> > 
> 
> This is with the revert?
> 

Nope, sorry for being unclear, this is the only trace I got.

	Sergey

  reply	other threads:[~2011-10-21  9:28 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-15 20:12 WARNING: at kernel/lockdep.c:690 __lock_acquire+0x168/0x164b() Sergey Senozhatsky
2011-10-15 21:42 ` David Rientjes
2011-10-15 22:23   ` Borislav Petkov
2011-10-15 22:32     ` David Rientjes
2011-10-16  5:09       ` Sergey Senozhatsky
2011-10-20 18:39       ` Borislav Petkov
2011-10-20 18:53         ` Sergey Senozhatsky
2011-10-20 19:07           ` Sergey Senozhatsky
2011-10-20 21:17             ` David Rientjes
2011-10-20 21:23               ` Tejun Heo
2011-10-20 21:31                 ` David Rientjes
2011-10-20 21:36                   ` Tejun Heo
2011-10-20 23:00                     ` Sergey Senozhatsky
2011-10-21  9:14                       ` David Rientjes
2011-10-21  9:26                         ` Sergey Senozhatsky [this message]
2011-10-21  9:45                         ` Yong Zhang
2011-11-03  7:17                           ` Sergey Senozhatsky
2011-11-03  7:27                             ` Yong Zhang
2011-11-03  7:45                               ` Sergey Senozhatsky
2011-11-03  7:53                                 ` Yong Zhang
2011-11-04  9:25                                   ` Borislav Petkov
2011-11-04  9:31                                     ` Sergey Senozhatsky
2011-11-07  4:54                                       ` Yong Zhang
2011-11-07  8:43                                         ` Sergey Senozhatsky
2011-11-04  9:34                                     ` Yong Zhang
2011-11-04  9:51                                       ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111021092620.GA25694@swordfish \
    --to=sergey.senozhatsky@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).