* [patch] Re: [Bug #12100] resume (S2R) broken by Intel microcode module, on A110L @ 2008-12-19 23:15 Dmitry Adamushko 2008-12-19 23:30 ` Ingo Molnar 2008-12-20 13:19 ` Andreas Mohr 0 siblings, 2 replies; 7+ messages in thread From: Dmitry Adamushko @ 2008-12-19 23:15 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Ingo Molnar, Andreas Mohr, Linux Kernel Mailing List, Kernel Testers List Hi, This is in response to the following bug report: Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12100 Subject : resume (S2R) broken by Intel microcode module, on A110L Submitter : Andreas Mohr <andi-5+Cda9B46AM@public.gmane.org> Date : 2008-11-25 08:48 (19 days old) Handled-By : Dmitry Adamushko <dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> --- From: Dmitry Adamushko <dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Subject: [microcode] fix a deadlock in microcode_resume_cpu() [ The deadlock scenario has been discovered by Andreas Mohr ] I think I might have a logical explanation why the system (http://bugzilla.kernel.org/show_bug.cgi?id=12100) might hang upon resuming, OTOH it should have likely hanged each and every time. (1) possible deadlock in microcode_resume_cpu() if either 'if' section is taken; (2) now, I don't see it in spec. and can't experimentally verify it (newer ucodes don't seem to be available for my Core2duo)... but logically-wise, I'd think that when read upon resuming, the 'microcode revision' (MSR 0x8B) should be back to its original one (we need to reload ucode anyway so it doesn't seem logical if a cpu doesn't drop the version)... if so, the comparison with memcmp() for the full 'struct cpu_signature' is wrong... and that's how one of the aforementioned 'if' sections might have been triggered - leading to a deadlock. Obviously, in my tests I simulated loading/resuming with the ucode of the same version (just to see that the file is loaded/re-loaded upon resuming) so this issue has never popped up. I'd appreciate if someone with an appropriate system might give a try to the 2nd patch (titled "fix a comparison && deadlock..."). In any case, the deadlock situation is a must-have fix. Signed-off-by: Dmitry Adamushko <dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> CC: Info Molnar <mingo-X9Un+BFzKDI@public.gmane.org> CC: Andreas Mohr <andi-5+Cda9B46AM@public.gmane.org> --- diff --git a/arch/x86/kernel/microcode_core.c b/arch/x86/kernel/microcode_core.c index 82fb280..c4b5b24 100644 --- a/arch/x86/kernel/microcode_core.c +++ b/arch/x86/kernel/microcode_core.c @@ -272,13 +272,18 @@ static struct attribute_group mc_attr_group = { .name = "microcode", }; -static void microcode_fini_cpu(int cpu) +static void __microcode_fini_cpu(int cpu) { struct ucode_cpu_info *uci = ucode_cpu_info + cpu; - mutex_lock(µcode_mutex); microcode_ops->microcode_fini_cpu(cpu); uci->valid = 0; +} + +static void microcode_fini_cpu(int cpu) +{ + mutex_lock(µcode_mutex); + __microcode_fini_cpu(cpu); mutex_unlock(µcode_mutex); } @@ -306,12 +311,16 @@ static int microcode_resume_cpu(int cpu) * to this cpu (a bit of paranoia): */ if (microcode_ops->collect_cpu_info(cpu, &nsig)) { - microcode_fini_cpu(cpu); + __microcode_fini_cpu(cpu); + printk(KERN_ERR "failed to collect_cpu_info for resuming cpu #%d\n", + cpu); return -1; } - if (memcmp(&nsig, &uci->cpu_sig, sizeof(nsig))) { - microcode_fini_cpu(cpu); + if ((nsig.sig != uci->cpu_sig.sig) || (nsig.pf != uci->cpu_sig.pf)) { + __microcode_fini_cpu(cpu); + printk(KERN_ERR "cached ucode doesn't match the resuming cpu #%d\n", + cpu); /* Should we look for a new ucode here? */ return 1; } diff --git a/arch/x86/kernel/microcode_intel.c b/arch/x86/kernel/microcode_intel.c index 622dc4a..a8e6279 100644 --- a/arch/x86/kernel/microcode_intel.c +++ b/arch/x86/kernel/microcode_intel.c @@ -155,6 +155,7 @@ static DEFINE_SPINLOCK(microcode_update_lock); static int collect_cpu_info(int cpu_num, struct cpu_signature *csig) { struct cpuinfo_x86 *c = &cpu_data(cpu_num); + unsigned long flags; unsigned int val[2]; memset(csig, 0, sizeof(*csig)); @@ -174,11 +175,16 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig) csig->pf = 1 << ((val[1] >> 18) & 7); } + /* serialize access to the physical write to MSR 0x79 */ + spin_lock_irqsave(µcode_update_lock, flags); + wrmsr(MSR_IA32_UCODE_REV, 0, 0); /* see notes above for revision 1.07. Apparent chip bug */ sync_core(); /* get the current revision from MSR 0x8B */ rdmsr(MSR_IA32_UCODE_REV, val[0], csig->rev); + spin_unlock_irqrestore(µcode_update_lock, flags); + pr_debug("microcode: collect_cpu_info : sig=0x%x, pf=0x%x, rev=0x%x\n", csig->sig, csig->pf, csig->rev); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [patch] Re: [Bug #12100] resume (S2R) broken by Intel microcode module, on A110L 2008-12-19 23:15 [patch] Re: [Bug #12100] resume (S2R) broken by Intel microcode module, on A110L Dmitry Adamushko @ 2008-12-19 23:30 ` Ingo Molnar [not found] ` <20081219233006.GA17984-X9Un+BFzKDI@public.gmane.org> 2008-12-20 13:19 ` Andreas Mohr 1 sibling, 1 reply; 7+ messages in thread From: Ingo Molnar @ 2008-12-19 23:30 UTC (permalink / raw) To: Dmitry Adamushko Cc: Rafael J. Wysocki, Andreas Mohr, Linux Kernel Mailing List, Kernel Testers List * Dmitry Adamushko <dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Hi, > > > This is in response to the following bug report: > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12100 > Subject : resume (S2R) broken by Intel microcode module, on A110L > Submitter : Andreas Mohr <andi-5+Cda9B46AM@public.gmane.org> > Date : 2008-11-25 08:48 (19 days old) > Handled-By : Dmitry Adamushko <dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> applied to tip/x86/microcode, thanks Dmitry! The fix looks right but somewhat intrusive in scope, so i'm a bit reluctant to push it towards .28 straight away - without having feedback in the bugzilla. If feedback is positive (the bug reported there goes away completely) we can cherry-pick it over into x86/urgent, ok? And in any case i've marked it as a -stable backport for .28.1. Ingo ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20081219233006.GA17984-X9Un+BFzKDI@public.gmane.org>]
* Re: [patch] Re: [Bug #12100] resume (S2R) broken by Intel microcode module, on A110L [not found] ` <20081219233006.GA17984-X9Un+BFzKDI@public.gmane.org> @ 2009-01-11 14:56 ` Ingo Molnar [not found] ` <20090111145615.GA26173-X9Un+BFzKDI@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Ingo Molnar @ 2009-01-11 14:56 UTC (permalink / raw) To: Dmitry Adamushko, andeas.herrmann3-5C7GfCeVMHo, Peter Zijlstra Cc: Rafael J. Wysocki, Andreas Mohr, Linux Kernel Mailing List, Kernel Testers List * Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote: > * Dmitry Adamushko <dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > Hi, > > > > > > This is in response to the following bug report: > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12100 > > Subject : resume (S2R) broken by Intel microcode module, on A110L > > Submitter : Andreas Mohr <andi-5+Cda9B46AM@public.gmane.org> > > Date : 2008-11-25 08:48 (19 days old) > > Handled-By : Dmitry Adamushko <dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > > applied to tip/x86/microcode, thanks Dmitry! > > The fix looks right but somewhat intrusive in scope, so i'm a bit > reluctant to push it towards .28 straight away - without having feedback > in the bugzilla. If feedback is positive (the bug reported there goes > away completely) we can cherry-pick it over into x86/urgent, ok? And in > any case i've marked it as a -stable backport for .28.1. hm, -tip testing just found this microcode locking lockdep splat: [ 48.004158] SMP alternatives: switching to UP code [ 48.342853] CPU0 attaching NULL sched-domain. [ 48.344288] CPU1 attaching NULL sched-domain. [ 48.354696] CPU0 attaching NULL sched-domain. [ 48.361215] device: 'cpu1': device_unregister [ 48.364231] device: 'cpu1': device_create_release [ 48.368138] [ 48.368139] ======================================================= [ 48.372039] [ INFO: possible circular locking dependency detected ] [ 48.372039] 2.6.29-rc1-tip-00901-g9699183-dirty #15577 [ 48.372039] ------------------------------------------------------- [ 48.372039] S99local/3496 is trying to acquire lock: [ 48.372039] (microcode_mutex){--..}, at: [<c0118489>] microcode_fini_cpu+0x17/0x2b [ 48.372039] [ 48.372039] but task is already holding lock: [ 48.372039] (&cpu_hotplug.lock){--..}, at: [<c012f508>] cpu_hotplug_begin+0x1f/0x47 [ 48.372039] [ 48.372039] which lock already depends on the new lock. [ 48.372039] [ 48.372039] [ 48.372039] the existing dependency chain (in reverse order) is: [ 48.372039] [ 48.372039] -> #1 (&cpu_hotplug.lock){--..}: [ 48.372039] [<c014d3f1>] validate_chain+0x8e9/0xb94 [ 48.372039] [<c014dd03>] __lock_acquire+0x667/0x6e1 [ 48.372039] [<c014ddda>] lock_acquire+0x5d/0x7a [ 48.372039] [<c0a6fac3>] mutex_lock_nested+0xdc/0x170 [ 48.372039] [<c012f552>] get_online_cpus+0x22/0x34 [ 48.372039] [<c013ce08>] work_on_cpu+0x50/0x8a [ 48.372039] [<c0118465>] microcode_init_cpu+0x25/0x32 [ 48.372039] [<c0118699>] mc_sysdev_add+0x91/0x9b [ 48.372039] [<c04cbd09>] sysdev_driver_register+0x9b/0xea [ 48.372039] [<c0fb68e0>] microcode_init+0x8a/0xe4 [ 48.372039] [<c010106a>] do_one_initcall+0x6a/0x16e [ 48.372039] [<c0fa952d>] kernel_init+0x115/0x166 [ 48.372039] [<c0103737>] kernel_thread_helper+0x7/0x10 [ 48.372039] [<ffffffff>] 0xffffffff [ 48.372039] [ 48.372039] -> #0 (microcode_mutex){--..}: [ 48.372039] [<c014d0fc>] validate_chain+0x5f4/0xb94 [ 48.372039] [<c014dd03>] __lock_acquire+0x667/0x6e1 [ 48.372039] [<c014ddda>] lock_acquire+0x5d/0x7a [ 48.372039] [<c0a6fac3>] mutex_lock_nested+0xdc/0x170 [ 48.372039] [<c0118489>] microcode_fini_cpu+0x17/0x2b [ 48.372039] [<c0a6ce38>] mc_cpu_callback+0xed/0xfa [ 48.372039] [<c0142a48>] notifier_call_chain+0x2b/0x4a [ 48.372039] [<c0142a98>] __raw_notifier_call_chain+0x13/0x15 [ 48.372039] [<c0142aab>] raw_notifier_call_chain+0x11/0x13 [ 48.372039] [<c0a2925e>] _cpu_down+0x171/0x22a [ 48.372039] [<c0a2935a>] cpu_down+0x43/0x68 [ 48.372039] [<c0a2a27f>] store_online+0x2a/0x5e [ 48.372039] [<c04cba85>] sysdev_store+0x20/0x28 [ 48.372039] [<c01d07d0>] sysfs_write_file+0xbd/0xe8 [ 48.372039] [<c01912c2>] vfs_write+0x91/0x138 [ 48.372039] [<c01917c8>] sys_write+0x40/0x65 [ 48.372039] [<c0102e55>] sysenter_do_call+0x12/0x35 [ 48.372039] [<ffffffff>] 0xffffffff [ 48.372039] [ 48.372039] other info that might help us debug this: [ 48.372039] [ 48.372039] 3 locks held by S99local/3496: [ 48.372039] #0: (&buffer->mutex){--..}, at: [<c01d073d>] sysfs_write_file+0x2a/0xe8 [ 48.372039] #1: (cpu_add_remove_lock){--..}, at: [<c012f4b5>] cpu_maps_update_begin+0x14/0x16 [ 48.372039] #2: (&cpu_hotplug.lock){--..}, at: [<c012f508>] cpu_hotplug_begin+0x1f/0x47 [ 48.372039] [ 48.372039] stack backtrace: [ 48.372039] Pid: 3496, comm: S99local Not tainted 2.6.29-rc1-tip-00901-g9699183-dirty #15577 [ 48.372039] Call Trace: [ 48.372039] [<c014cafd>] print_circular_bug_tail+0xab/0xb6 [ 48.372039] [<c014d0fc>] validate_chain+0x5f4/0xb94 [ 48.372039] [<c0a710b4>] ? _spin_unlock_irqrestore+0x34/0x41 [ 48.372039] [<c014dd03>] __lock_acquire+0x667/0x6e1 [ 48.372039] [<c014c508>] ? trace_hardirqs_on_caller+0x120/0x15f [ 48.372039] [<c014ddda>] lock_acquire+0x5d/0x7a [ 48.372039] [<c0118489>] ? microcode_fini_cpu+0x17/0x2b [ 48.372039] [<c0a6fac3>] mutex_lock_nested+0xdc/0x170 [ 48.372039] [<c0118489>] ? microcode_fini_cpu+0x17/0x2b [ 48.372039] [<c0118489>] ? microcode_fini_cpu+0x17/0x2b [ 48.372039] [<c0118489>] microcode_fini_cpu+0x17/0x2b [ 48.372039] [<c0a6ce38>] mc_cpu_callback+0xed/0xfa [ 48.372039] [<c0142a48>] notifier_call_chain+0x2b/0x4a [ 48.372039] [<c0142a98>] __raw_notifier_call_chain+0x13/0x15 [ 48.372039] [<c0142aab>] raw_notifier_call_chain+0x11/0x13 [ 48.372039] [<c0a2925e>] _cpu_down+0x171/0x22a [ 48.372039] [<c0a2935a>] cpu_down+0x43/0x68 [ 48.372039] [<c0a2a27f>] store_online+0x2a/0x5e [ 48.372039] [<c0a2a255>] ? store_online+0x0/0x5e [ 48.372039] [<c04cba85>] sysdev_store+0x20/0x28 [ 48.372039] [<c01d07d0>] sysfs_write_file+0xbd/0xe8 [ 48.372039] [<c01d0713>] ? sysfs_write_file+0x0/0xe8 [ 48.372039] [<c01912c2>] vfs_write+0x91/0x138 [ 48.372039] [<c01917c8>] sys_write+0x40/0x65 [ 48.372039] [<c0102e55>] sysenter_do_call+0x12/0x35 [ 49.380693] device: 'cpu1': device_add [ 49.384346] lockdep: fixing up alternatives. [ 49.388142] SMP alternatives: switching to SMP code config/full bootlog on request. Andreas, Dmitry, any ideas? Ingo ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20090111145615.GA26173-X9Un+BFzKDI@public.gmane.org>]
* Re: [patch] Re: [Bug #12100] resume (S2R) broken by Intel microcode module, on A110L [not found] ` <20090111145615.GA26173-X9Un+BFzKDI@public.gmane.org> @ 2009-01-11 19:58 ` Dmitry Adamushko [not found] ` <b647ffbd0901111158r467f3358wa50f6a6e92d8129f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Dmitry Adamushko @ 2009-01-11 19:58 UTC (permalink / raw) To: Ingo Molnar Cc: andeas.herrmann3-5C7GfCeVMHo, Peter Zijlstra, Rafael J. Wysocki, Andreas Mohr, Linux Kernel Mailing List, Kernel Testers List 2009/1/11 Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>: > > * Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote: > >> * Dmitry Adamushko <dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >> > Hi, >> > >> > >> > This is in response to the following bug report: >> > >> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12100 >> > Subject : resume (S2R) broken by Intel microcode module, on A110L >> > Submitter : Andreas Mohr <andi-5+Cda9B46AM@public.gmane.org> >> > Date : 2008-11-25 08:48 (19 days old) >> > Handled-By : Dmitry Adamushko <dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> >> >> applied to tip/x86/microcode, thanks Dmitry! >> >> The fix looks right but somewhat intrusive in scope, so i'm a bit >> reluctant to push it towards .28 straight away - without having feedback >> in the bugzilla. If feedback is positive (the bug reported there goes >> away completely) we can cherry-pick it over into x86/urgent, ok? And in >> any case i've marked it as a -stable backport for .28.1. > > hm, -tip testing just found this microcode locking lockdep splat: > > [ 48.004158] SMP alternatives: switching to UP code > [ 48.342853] CPU0 attaching NULL sched-domain. > [ 48.344288] CPU1 attaching NULL sched-domain. > [ 48.354696] CPU0 attaching NULL sched-domain. > [ 48.361215] device: 'cpu1': device_unregister > [ 48.364231] device: 'cpu1': device_create_release > [ 48.368138] > [ 48.368139] ======================================================= > [ 48.372039] [ INFO: possible circular locking dependency detected ] > [ 48.372039] 2.6.29-rc1-tip-00901-g9699183-dirty #15577 > [ 48.372039] ------------------------------------------------------- > [ 48.372039] S99local/3496 is trying to acquire lock: > [ 48.372039] (microcode_mutex){--..}, at: [<c0118489>] microcode_fini_cpu+0x17/0x2b > [ 48.372039] > [ 48.372039] but task is already holding lock: > [ 48.372039] (&cpu_hotplug.lock){--..}, at: [<c012f508>] cpu_hotplug_begin+0x1f/0x47 > [ 48.372039] > [ 48.372039] which lock already depends on the new lock. > [ 48.372039] > [ 48.372039] > [ 48.372039] the existing dependency chain (in reverse order) is: > [ 48.372039] > [ 48.372039] -> #1 (&cpu_hotplug.lock){--..}: > [ 48.372039] [<c014d3f1>] validate_chain+0x8e9/0xb94 > [ 48.372039] [<c014dd03>] __lock_acquire+0x667/0x6e1 > [ 48.372039] [<c014ddda>] lock_acquire+0x5d/0x7a > [ 48.372039] [<c0a6fac3>] mutex_lock_nested+0xdc/0x170 > [ 48.372039] [<c012f552>] get_online_cpus+0x22/0x34 > [ 48.372039] [<c013ce08>] work_on_cpu+0x50/0x8a > [ 48.372039] [<c0118465>] microcode_init_cpu+0x25/0x32 > [ 48.372039] [<c0118699>] mc_sysdev_add+0x91/0x9b > [ 48.372039] [<c04cbd09>] sysdev_driver_register+0x9b/0xea I'll check more carefully... At the first glance, the presence of work_on_cpu() looks strange. My first idea was that it's used somewhere by request_firmware() but even assuming some functions might have been inlined (and a call via a function pointer is not shown either), I don't immediately see how we might end up with microcode_init_cpu() -> ... -> work_on_cpu(). I've locked up all the use cases of work_on_cpu() in the current -tip (about 20), and none of them seem to explain its appearance in the trace. weird... > > Ingo > -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <b647ffbd0901111158r467f3358wa50f6a6e92d8129f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [patch] Re: [Bug #12100] resume (S2R) broken by Intel microcode module, on A110L [not found] ` <b647ffbd0901111158r467f3358wa50f6a6e92d8129f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2009-01-11 20:11 ` Mike Travis 0 siblings, 0 replies; 7+ messages in thread From: Mike Travis @ 2009-01-11 20:11 UTC (permalink / raw) To: Dmitry Adamushko Cc: Ingo Molnar, andeas.herrmann3-5C7GfCeVMHo, Peter Zijlstra, Rafael J. Wysocki, Andreas Mohr, Linux Kernel Mailing List, Kernel Testers List, Rusty Russell Dmitry Adamushko wrote: > 2009/1/11 Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>: >> * Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org> wrote: >> >>> * Dmitry Adamushko <dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>> >>>> Hi, >>>> >>>> >>>> This is in response to the following bug report: >>>> >>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12100 >>>> Subject : resume (S2R) broken by Intel microcode module, on A110L >>>> Submitter : Andreas Mohr <andi-5+Cda9B46AM@public.gmane.org> >>>> Date : 2008-11-25 08:48 (19 days old) >>>> Handled-By : Dmitry Adamushko <dmitry.adamushko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> >>> applied to tip/x86/microcode, thanks Dmitry! >>> >>> The fix looks right but somewhat intrusive in scope, so i'm a bit >>> reluctant to push it towards .28 straight away - without having feedback >>> in the bugzilla. If feedback is positive (the bug reported there goes >>> away completely) we can cherry-pick it over into x86/urgent, ok? And in >>> any case i've marked it as a -stable backport for .28.1. >> hm, -tip testing just found this microcode locking lockdep splat: >> >> [ 48.004158] SMP alternatives: switching to UP code >> [ 48.342853] CPU0 attaching NULL sched-domain. >> [ 48.344288] CPU1 attaching NULL sched-domain. >> [ 48.354696] CPU0 attaching NULL sched-domain. >> [ 48.361215] device: 'cpu1': device_unregister >> [ 48.364231] device: 'cpu1': device_create_release >> [ 48.368138] >> [ 48.368139] ======================================================= >> [ 48.372039] [ INFO: possible circular locking dependency detected ] >> [ 48.372039] 2.6.29-rc1-tip-00901-g9699183-dirty #15577 >> [ 48.372039] ------------------------------------------------------- >> [ 48.372039] S99local/3496 is trying to acquire lock: >> [ 48.372039] (microcode_mutex){--..}, at: [<c0118489>] microcode_fini_cpu+0x17/0x2b >> [ 48.372039] >> [ 48.372039] but task is already holding lock: >> [ 48.372039] (&cpu_hotplug.lock){--..}, at: [<c012f508>] cpu_hotplug_begin+0x1f/0x47 >> [ 48.372039] >> [ 48.372039] which lock already depends on the new lock. >> [ 48.372039] >> [ 48.372039] >> [ 48.372039] the existing dependency chain (in reverse order) is: >> [ 48.372039] >> [ 48.372039] -> #1 (&cpu_hotplug.lock){--..}: >> [ 48.372039] [<c014d3f1>] validate_chain+0x8e9/0xb94 >> [ 48.372039] [<c014dd03>] __lock_acquire+0x667/0x6e1 >> [ 48.372039] [<c014ddda>] lock_acquire+0x5d/0x7a >> [ 48.372039] [<c0a6fac3>] mutex_lock_nested+0xdc/0x170 >> [ 48.372039] [<c012f552>] get_online_cpus+0x22/0x34 >> [ 48.372039] [<c013ce08>] work_on_cpu+0x50/0x8a >> [ 48.372039] [<c0118465>] microcode_init_cpu+0x25/0x32 >> [ 48.372039] [<c0118699>] mc_sysdev_add+0x91/0x9b >> [ 48.372039] [<c04cbd09>] sysdev_driver_register+0x9b/0xea > > I'll check more carefully... At the first glance, the presence of > work_on_cpu() looks strange. > > My first idea was that it's used somewhere by request_firmware() but > even assuming some functions might have been inlined (and a call via a > function pointer is not shown either), I don't immediately see how we > might end up with microcode_init_cpu() -> ... -> work_on_cpu(). It was in a commit that (it appears) Ingo has reverted: Subject: x86: cleanup remaining cpumask_t code in microcode_core.c Impact: Reduce problem with changing current->cpus_allowed mask directly. Use "work_on_cpu" to replace instances where set_cpus_allowed_ptr was being used. Signed-off-by: Mike Travis <travis-sJ/iWh9BUns@public.gmane.org> This work_on_cpu is to replace setting current->cpus_allowed when it's only for one cpu. But it has a call to get_online_cpus() that (I believe) is just to keep from offlining the cpu the work function is running on. And it's also the cause of the circular lock dependencies. Thanks, Mike > > I've locked up all the use cases of work_on_cpu() in the current -tip > (about 20), and none of them seem to explain its appearance in the > trace. weird... > > >> Ingo >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] Re: [Bug #12100] resume (S2R) broken by Intel microcode module, on A110L 2008-12-19 23:15 [patch] Re: [Bug #12100] resume (S2R) broken by Intel microcode module, on A110L Dmitry Adamushko 2008-12-19 23:30 ` Ingo Molnar @ 2008-12-20 13:19 ` Andreas Mohr [not found] ` <20081220131946.GA31366-p/qQFhXj4MHA4IYVXhSI5GHfThorsUsI@public.gmane.org> 1 sibling, 1 reply; 7+ messages in thread From: Andreas Mohr @ 2008-12-20 13:19 UTC (permalink / raw) To: Dmitry Adamushko Cc: Rafael J. Wysocki, Ingo Molnar, Andreas Mohr, Linux Kernel Mailing List, Kernel Testers List Hi, On Sat, Dec 20, 2008 at 12:15:24AM +0100, Dmitry Adamushko wrote: > I'd appreciate if someone with an appropriate system might give a try to the > 2nd patch (titled "fix a comparison && deadlock..."). > > > In any case, the deadlock situation is a must-have fix. YES, that actually fixed it!! Built -rc9, booted 3 times and suspended, always deadlock upon resume. Patched -rc9, make modules_install, suspended 3 times, always fine. Weird, I thought I had attempted all (most?) of the parts that your patch references before... Thanks a lot for your very responsive handling! I suppose this can get to -urgent now, JIT for the Christmas release? Andreas Mohr ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20081220131946.GA31366-p/qQFhXj4MHA4IYVXhSI5GHfThorsUsI@public.gmane.org>]
* Re: [patch] Re: [Bug #12100] resume (S2R) broken by Intel microcode module, on A110L [not found] ` <20081220131946.GA31366-p/qQFhXj4MHA4IYVXhSI5GHfThorsUsI@public.gmane.org> @ 2008-12-20 13:30 ` Ingo Molnar 0 siblings, 0 replies; 7+ messages in thread From: Ingo Molnar @ 2008-12-20 13:30 UTC (permalink / raw) To: Andreas Mohr Cc: Dmitry Adamushko, Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List * Andreas Mohr <andi-5+Cda9B46AM@public.gmane.org> wrote: > Hi, > > On Sat, Dec 20, 2008 at 12:15:24AM +0100, Dmitry Adamushko wrote: > > I'd appreciate if someone with an appropriate system might give a try to the > > 2nd patch (titled "fix a comparison && deadlock..."). > > > > > > In any case, the deadlock situation is a must-have fix. > > YES, that actually fixed it!! > > Built -rc9, booted 3 times and suspended, always deadlock upon resume. > Patched -rc9, make modules_install, suspended 3 times, always fine. great! > Weird, I thought I had attempted all (most?) of the parts that your > patch references before... > > Thanks a lot for your very responsive handling! > > I suppose this can get to -urgent now, JIT for the Christmas release? yeah, i've now moved it over to tip/x86/urgent, thanks Andreas! Ingo ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-01-11 20:11 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-19 23:15 [patch] Re: [Bug #12100] resume (S2R) broken by Intel microcode module, on A110L Dmitry Adamushko
2008-12-19 23:30 ` Ingo Molnar
[not found] ` <20081219233006.GA17984-X9Un+BFzKDI@public.gmane.org>
2009-01-11 14:56 ` Ingo Molnar
[not found] ` <20090111145615.GA26173-X9Un+BFzKDI@public.gmane.org>
2009-01-11 19:58 ` Dmitry Adamushko
[not found] ` <b647ffbd0901111158r467f3358wa50f6a6e92d8129f-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-01-11 20:11 ` Mike Travis
2008-12-20 13:19 ` Andreas Mohr
[not found] ` <20081220131946.GA31366-p/qQFhXj4MHA4IYVXhSI5GHfThorsUsI@public.gmane.org>
2008-12-20 13:30 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).