public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Mike Travis <travis@sgi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PULL] cpumask tree
Date: Sat, 3 Jan 2009 17:42:55 +0100	[thread overview]
Message-ID: <20090103164255.GA20657@elte.hu> (raw)
In-Reply-To: <495F8DCA.1060905@sgi.com>


* Mike Travis <travis@sgi.com> wrote:

> Ingo Molnar wrote:
> > * Ingo Molnar <mingo@elte.hu> wrote:
> > 
> >> i suspect it's:
> >>
> >> | commit 2d22bd5e74519854458ad372a89006e65f45e628
> >> | Author: Mike Travis <travis@sgi.com>
> >> | Date:   Wed Dec 31 18:08:46 2008 -0800
> >> |
> >> |     x86: cleanup remaining cpumask_t code in microcode_core.c
> >>
> >> as the microcode is loaded during CPU onlining.
> > 
> > yep, that's the bad one. Should i revert it or do you have a safe fix in 
> > mind?
> > 
> > 	Ingo
> 
> Probably revert for now.  There are a few more following patches that 
> also use 'work_on_cpu' so a better (more global?) fix should be used.
> 
> Any thought on using a recursive lock for cpu-hotplug-lock?  (At least 
> for get_online_cpus()?)

but the problem has nothing to do with self-recursion. Take a look at the 
lockdep warning i posted (also below) - the locks are simply taken in the 
wrong order.

your change adds this cpu_hotplug.lock usage:

[   43.652000] -> #1 (&cpu_hotplug.lock){--..}:
[   43.652000]        [<ffffffff8027a7c0>] __lock_acquire+0xf10/0x1360
[   43.652000]        [<ffffffff8027aca9>] lock_acquire+0x99/0xd0
[   43.652000]        [<ffffffff809b5e4a>] __mutex_lock_common+0xaa/0x450
[   43.652000]        [<ffffffff809b62cf>] mutex_lock_nested+0x3f/0x50
[   43.652000]        [<ffffffff802516ba>] get_online_cpus+0x3a/0x50
[   43.652000]        [<ffffffff802648dc>] work_on_cpu+0x6c/0xc0
[   43.652000]        [<ffffffff8022b2a2>] mc_sysdev_add+0x92/0xa0
[   43.652000]        [<ffffffff8050a800>] sysdev_driver_register+0xb0/0x140
[   43.652000]        [<ffffffff8163c792>] microcode_init+0xb2/0x13b
[   43.652000]        [<ffffffff8020a041>] do_one_initcall+0x41/0x180
[   43.652000]        [<ffffffff8162e6cb>] kernel_init+0x145/0x19d
[   43.652000]        [<ffffffff802146aa>] child_rip+0xa/0x20
[   43.652000]        [<ffffffffffffffff>] 0xffffffffffffffff

which nests the inside sysdev_drivers_lock - which is wrong 
[sysdev_drivers_lock is a pretty lowlevel lock that generally nests inside 
the CPU hotplug lock].

If you want to use work_on_cpu() it should be done on a higher level, so 
that sysdev_drivers_lock is taken after the hotplug lock.

	Ingo

[   43.376051] lockdep: fixing up alternatives.
[   43.380007] SMP alternatives: switching to UP code
[   43.616014] CPU0 attaching NULL sched-domain.
[   43.620068] CPU1 attaching NULL sched-domain.
[   43.644482] CPU0 attaching NULL sched-domain.
[   43.648264] 
[   43.648265] =======================================================
[   43.652000] [ INFO: possible circular locking dependency detected ]
[   43.652000] 2.6.28-05081-geeff031-dirty #37
[   43.652000] -------------------------------------------------------
[   43.652000] S99local/1238 is trying to acquire lock:
[   43.652000]  (sysdev_drivers_lock){--..}, at: [<ffffffff8050a52d>] sysdev_unregister+0x1d/0x80
[   43.652000] 
[   43.652000] but task is already holding lock:
[   43.652000]  (&cpu_hotplug.lock){--..}, at: [<ffffffff802515d7>] cpu_hotplug_begin+0x27/0x60
[   43.652000] 
[   43.652000] which lock already depends on the new lock.
[   43.652000] 
[   43.652000] 
[   43.652000] the existing dependency chain (in reverse order) is:
[   43.652000] 
[   43.652000] -> #1 (&cpu_hotplug.lock){--..}:
[   43.652000]        [<ffffffff8027a7c0>] __lock_acquire+0xf10/0x1360
[   43.652000]        [<ffffffff8027aca9>] lock_acquire+0x99/0xd0
[   43.652000]        [<ffffffff809b5e4a>] __mutex_lock_common+0xaa/0x450
[   43.652000]        [<ffffffff809b62cf>] mutex_lock_nested+0x3f/0x50
[   43.652000]        [<ffffffff802516ba>] get_online_cpus+0x3a/0x50
[   43.652000]        [<ffffffff802648dc>] work_on_cpu+0x6c/0xc0
[   43.652000]        [<ffffffff8022b2a2>] mc_sysdev_add+0x92/0xa0
[   43.652000]        [<ffffffff8050a800>] sysdev_driver_register+0xb0/0x140
[   43.652000]        [<ffffffff8163c792>] microcode_init+0xb2/0x13b
[   43.652000]        [<ffffffff8020a041>] do_one_initcall+0x41/0x180
[   43.652000]        [<ffffffff8162e6cb>] kernel_init+0x145/0x19d
[   43.652000]        [<ffffffff802146aa>] child_rip+0xa/0x20
[   43.652000]        [<ffffffffffffffff>] 0xffffffffffffffff
[   43.652000] 
[   43.652000] -> #0 (sysdev_drivers_lock){--..}:
[   43.652000]        [<ffffffff8027a89c>] __lock_acquire+0xfec/0x1360
[   43.652000]        [<ffffffff8027aca9>] lock_acquire+0x99/0xd0
[   43.652000]        [<ffffffff809b5e4a>] __mutex_lock_common+0xaa/0x450
[   43.652000]        [<ffffffff809b62cf>] mutex_lock_nested+0x3f/0x50
[   43.652000]        [<ffffffff8050a52d>] sysdev_unregister+0x1d/0x80
[   43.652000]        [<ffffffff809af9d1>] mce_cpu_callback+0xce/0x101
[   43.652000]        [<ffffffff809bbb75>] notifier_call_chain+0x65/0xa0
[   43.652000]        [<ffffffff8026d696>] raw_notifier_call_chain+0x16/0x20
[   43.652000]        [<ffffffff80964a00>] _cpu_down+0x240/0x350
[   43.652000]        [<ffffffff80964b8b>] cpu_down+0x7b/0xa0
[   43.652000]        [<ffffffff80966268>] store_online+0x48/0xa0
[   43.652000]        [<ffffffff80509e90>] sysdev_store+0x20/0x30
[   43.652000]        [<ffffffff80335ddf>] sysfs_write_file+0xcf/0x140
[   43.652000]        [<ffffffff802dc1f7>] vfs_write+0xc7/0x150
[   43.652000]        [<ffffffff802dc375>] sys_write+0x55/0x90
[   43.652000]        [<ffffffff802133ca>] system_call_fastpath+0x16/0x1b
[   43.652000]        [<ffffffffffffffff>] 0xffffffffffffffff
[   43.652000] 
[   43.652000] other info that might help us debug this:
[   43.652000] 
[   43.652000] 3 locks held by S99local/1238:
[   43.652000]  #0:  (&buffer->mutex){--..}, at: [<ffffffff80335d58>] sysfs_write_file+0x48/0x140
[   43.652000]  #1:  (cpu_add_remove_lock){--..}, at: [<ffffffff80964b3f>] cpu_down+0x2f/0xa0
[   43.652000]  #2:  (&cpu_hotplug.lock){--..}, at: [<ffffffff802515d7>] cpu_hotplug_begin+0x27/0x60
[   43.652000] 
[   43.652000] stack backtrace:
[   43.652000] Pid: 1238, comm: S99local Not tainted 2.6.28-05081-geeff031-dirty #37
[   43.652000] Call Trace:
[   43.652000]  [<ffffffff80277f24>] print_circular_bug_tail+0xa4/0x100
[   43.652000]  [<ffffffff8027a89c>] __lock_acquire+0xfec/0x1360
[   43.652000]  [<ffffffff8027aca9>] lock_acquire+0x99/0xd0
[   43.652000]  [<ffffffff8050a52d>] ? sysdev_unregister+0x1d/0x80
[   43.652000]  [<ffffffff809b5e4a>] __mutex_lock_common+0xaa/0x450
[   43.652000]  [<ffffffff8050a52d>] ? sysdev_unregister+0x1d/0x80
[   43.652000]  [<ffffffff8050a52d>] ? sysdev_unregister+0x1d/0x80
[   43.652000]  [<ffffffff809b62cf>] mutex_lock_nested+0x3f/0x50
[   43.652000]  [<ffffffff8050a52d>] sysdev_unregister+0x1d/0x80
[   43.652000]  [<ffffffff809af9d1>] mce_cpu_callback+0xce/0x101
[   43.652000]  [<ffffffff809bbb75>] notifier_call_chain+0x65/0xa0
[   43.652000]  [<ffffffff8026d696>] raw_notifier_call_chain+0x16/0x20
[   43.652000]  [<ffffffff80964a00>] _cpu_down+0x240/0x350
[   43.652000]  [<ffffffff809b4763>] ? wait_for_common+0xe3/0x1b0
[   43.652000]  [<ffffffff80964b8b>] cpu_down+0x7b/0xa0
[   43.652000]  [<ffffffff80966268>] store_online+0x48/0xa0
[   43.652000]  [<ffffffff80509e90>] sysdev_store+0x20/0x30
[   43.652000]  [<ffffffff80335ddf>] sysfs_write_file+0xcf/0x140
[   43.652000]  [<ffffffff802dc1f7>] vfs_write+0xc7/0x150
[   43.652000]  [<ffffffff802dc375>] sys_write+0x55/0x90
[   43.652000]  [<ffffffff802133ca>] system_call_fastpath+0x16/0x1b
[   43.652104] device: 'msr1': device_unregister
[   43.656005] PM: Removing info for No Bus:msr1

  reply	other threads:[~2009-01-03 16:43 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-01  1:19 [PULL] cpumask tree Rusty Russell
2009-01-02 20:06 ` Linus Torvalds
2009-01-02 20:38   ` Ingo Molnar
2009-01-02 23:31     ` Linus Torvalds
2009-01-03 19:38       ` [git pull] cpus4096 tree, part 3 Ingo Molnar
2009-01-03 20:28         ` Linus Torvalds
2009-01-03 20:36           ` Ingo Molnar
2009-01-03 20:56             ` Linus Torvalds
2009-01-03 21:58               ` Ingo Molnar
2009-01-04  3:35               ` Rusty Russell
2009-01-04  4:28                 ` Mike Travis
2009-01-03 21:38             ` Ingo Molnar
2009-01-03 22:00               ` Linus Torvalds
2009-01-03 22:37                 ` Ingo Molnar
2009-01-05  1:14                   ` Nick Piggin
2009-01-05  1:16                     ` Nick Piggin
2009-01-26 19:00                       ` Andrew Morton
2009-01-26 19:09                         ` Linus Torvalds
2009-01-26 19:30                           ` Andrew Morton
2009-01-26 20:09                         ` Ingo Molnar
2009-01-26 20:44                           ` Andrew Morton
     [not found]                             ` <604427e00901261312w23a1f0f5y61fc5c6cc70297fb@mail.gmail.com>
2009-01-26 23:21                               ` Ingo Molnar
2009-01-26 23:44                                 ` Andrew Morton
2009-01-07 17:30                     ` Ingo Molnar
2009-01-03 20:58           ` Mike Travis
2009-01-03  7:20     ` [PULL] cpumask tree Rusty Russell
2009-01-03 10:52       ` Ingo Molnar
2009-01-03 11:59         ` [PATCH] ia64: cpumask fix for is_affinity_mask_valid() Ingo Molnar
2009-01-03 12:19           ` [PATCH] cpumask: convert RCU implementations, fix Ingo Molnar
2009-01-04  3:43           ` [PATCH] ia64: cpumask fix for is_affinity_mask_valid() Rusty Russell
2009-01-04  4:20             ` Mike Travis
2009-01-04 12:38               ` Ingo Molnar
2009-01-03 14:58         ` [PULL] cpumask tree Mike Travis
2009-01-03 15:06           ` Ingo Molnar
2009-01-03 15:31             ` Mike Travis
2009-01-03 15:47               ` Ingo Molnar
2009-01-03 15:52                 ` Mike Travis
2009-01-03 16:00                 ` Ingo Molnar
2009-01-03 16:09                   ` Mike Travis
2009-01-03 16:42                     ` Ingo Molnar [this message]
2009-01-03 16:48                       ` Mike Travis
2009-01-03 17:45                     ` Ingo Molnar
2009-01-03 18:13                       ` Ingo Molnar
2009-01-03 18:14                       ` Mike Travis
2009-01-03  0:23   ` Rusty Russell
2009-01-08 19:10 ` David Daney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090103164255.GA20657@elte.hu \
    --to=mingo@elte.hu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=torvalds@linux-foundation.org \
    --cc=travis@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox