All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: rjw@sisk.pl, bp@amd64.org, pavel@ucw.cz, len.brown@intel.com,
	tj@kernel.org, mingo@elte.hu, a.p.zijlstra@chello.nl,
	akpm@linux-foundation.org, suresh.b.siddha@intel.com,
	lucas.demarchi@profusion.mobi, rusty@rustcorp.com.au,
	rdunlap@xenotime.net, vatsa@linux.vnet.ibm.com,
	ashok.raj@intel.com, tigran@aivazian.fsnet.co.uk,
	tglx@linutronix.de, hpa@zytor.com, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v2 0/3] Freezer, CPU hotplug, x86 Microcode: Fix task freezing failures
Date: Mon, 10 Oct 2011 21:02:40 +0530	[thread overview]
Message-ID: <4E931018.8030904@linux.vnet.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.44L0.1110101119550.18069-100000@netrider.rowland.org>

On 10/10/2011 08:53 PM, Alan Stern wrote:
> On Mon, 10 Oct 2011, Srivatsa S. Bhat wrote:
> 
>> When CPU hotplug is run along with suspend/hibernate tests using
>> the pm_test framework, even at the freezer level, we hit task freezing
>> failures. One such failure was reported here:
>> https://lkml.org/lkml/2011/9/5/28
>>
>> An excerpt of the log:
>>
>>   Freezing of tasks failed after 20.01 seconds (2 tasks refusing to
>>   freeze, wq_busy=0):
>>   invert_cpu_stat D 0000000000000000  5304 20435  17329 0x00000084
>>    ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
>>    ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
>>    ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
>>   Call Trace:
>>    [<ffffffff81532de5>] schedule_timeout+0x235/0x320
>>    [<ffffffff81532a0b>] wait_for_common+0x11b/0x170
>>    [<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
>>    [<ffffffff81364486>] _request_firmware+0x156/0x2c0
>>    [<ffffffff81364686>] request_firmware+0x16/0x20
>>    [<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
>>    [<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
>>    [<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
>>    [<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
>>    [<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
>>    [<ffffffff8106d000>] __cpu_notify+0x20/0x40
>>    [<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
>>    [<ffffffff8152d07b>] cpu_up+0xd9/0xec
>>    [<ffffffff8151e599>] store_online+0x99/0xd0
>>    [<ffffffff81355eb0>] sysdev_store+0x20/0x30
>>    [<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
>>    [<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
>>    [<ffffffff8117f024>] sys_write+0x54/0xa0
>>    [<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
>>
>>
>> The reason behind this failure is explained below:
>>
>> The x86 microcode update driver has callbacks registered for CPU hotplug
>> events such as a CPU getting offlined or onlined. Things go wrong when a
>> CPU hotplug stress test is carried out along with a suspend/resume operation
>> running simultaneously. Upon getting a CPU_DEAD notification (for example,
>> when a CPU offline occurs with tasks not frozen), the microcode callback
>> frees up the microcode and invalidates it. Later, when that CPU gets onlined
>> with tasks being frozen, the microcode callback (for the CPU_ONLINE_FROZEN
>> event) tries to apply the microcode to the CPU; doesn't find it and hence
>> depends on the (currently frozen) userspace to get the microcode again. This
>> leads to the numerous "WARNING"s at drivers/base/firmware_class.c which
>> eventually leads to task freezing failures in the suspend code path, as has
>> been reported.
>>
>> So, this patch series addresses this issue by ensuring that CPU hotplug and
>> suspend/hibernate don't run in parallel, thereby fixing the task freezing
>> failures.
> 
> The seems like entirely the wrong way to go about solving this problem.
> 
> The kernel shouldn't be responsible for making hotplug stress tests 
> exclusive with system sleep.  Whoever is running those tests should be 
> smart enough to realize what's wrong if system sleep interferes with a 
> test.
> 
> Furthermore, if the entire problem is lack of CPU microcode, hasn't 
> that been fixed already?  There recently was a patch to avoid releasing 
> microcode after it was first loaded -- the idea being that there would 
> then be no need to get the microcode from userspace again at awkward 
> times while the system is resuming.
> 

Well, that was the first version of this patch itself :)
I forgot to give a link to it in the patch description:
http://thread.gmane.org/gmane.linux.kernel/1198291/focus=1200591

That was my first idea: to avoid releasing microcode after it was first
loaded. But Tejun and Borislav felt that a better way to fix the problem
would be to mutually exclude CPU hotplug and suspend/hibernate.
And later on, Borislav Acked that one-line patch on the grounds that even
though that was not the best solution for the bug, it is an optimization
in its own right.
And then I posted that one-line patch with a revised motivation:
http://thread.gmane.org/gmane.linux.kernel/1200882

-- 
Regards,
Srivatsa S. Bhat  <srivatsa.bhat@linux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab


  reply	other threads:[~2011-10-10 15:41 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-10 12:31 [PATCH v2 0/3] Freezer, CPU hotplug, x86 Microcode: Fix task freezing failures Srivatsa S. Bhat
2011-10-10 12:32 ` [PATCH v2 1/3] Introduce helper functions Srivatsa S. Bhat
2011-10-10 12:33 ` [PATCH v2 2/3] Mutually exclude cpu online and suspend/hibernate Srivatsa S. Bhat
2011-10-10 12:45   ` Srivatsa S. Bhat
2011-10-10 14:26     ` Peter Zijlstra
2011-10-10 15:16       ` Srivatsa S. Bhat
2011-10-11 20:32         ` Srivatsa S. Bhat
2011-10-11 21:56           ` Rafael J. Wysocki
2011-10-12  3:57             ` Srivatsa S. Bhat
2011-10-12 19:31               ` Rafael J. Wysocki
2011-10-12 21:25                 ` Srivatsa S. Bhat
2011-10-12 22:09                   ` Rafael J. Wysocki
2011-10-13 15:42                     ` Srivatsa S. Bhat
2011-10-13 16:06                       ` Tejun Heo
2011-10-13 17:01                         ` Borislav Petkov
2011-10-13 17:29                           ` Srivatsa S. Bhat
2011-10-19 17:29                             ` Srivatsa S. Bhat
2011-10-13 18:03                           ` Alan Stern
2011-10-13 19:07                             ` Rafael J. Wysocki
2011-10-13 19:08                         ` Rafael J. Wysocki
2011-10-10 15:25       ` Alan Stern
2011-10-10 17:00     ` Tejun Heo
2011-10-11  9:18       ` Peter Zijlstra
2011-10-11  9:37         ` Srivatsa S. Bhat
2011-10-10 12:33 ` [PATCH v2 3/3] Update documentation Srivatsa S. Bhat
2011-10-10 15:23 ` [PATCH v2 0/3] Freezer, CPU hotplug, x86 Microcode: Fix task freezing failures Alan Stern
2011-10-10 15:32   ` Srivatsa S. Bhat [this message]
2011-10-10 16:53     ` Borislav Petkov
2011-10-10 17:14       ` Pavel Machek
2011-10-10 17:30       ` Srivatsa S. Bhat
2011-10-10 17:53         ` Borislav Petkov
2011-10-10 18:08           ` tj
2011-10-10 18:34             ` Borislav Petkov
2011-10-10 18:45               ` Srivatsa S. Bhat
2011-10-10 18:53               ` tj
2011-10-10 19:00                 ` Srivatsa S. Bhat
2011-10-10 20:35                   ` Borislav Petkov
     [not found]                 ` <20111010202913.GA30798@aftab>
2011-10-10 21:13                   ` tj
2011-10-11  9:17       ` Peter Zijlstra
2011-10-10 16:57   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E931018.8030904@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=ashok.raj@intel.com \
    --cc=bp@amd64.org \
    --cc=hpa@zytor.com \
    --cc=len.brown@intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lucas.demarchi@profusion.mobi \
    --cc=mingo@elte.hu \
    --cc=pavel@ucw.cz \
    --cc=rdunlap@xenotime.net \
    --cc=rjw@sisk.pl \
    --cc=rusty@rustcorp.com.au \
    --cc=stern@rowland.harvard.edu \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tigran@aivazian.fsnet.co.uk \
    --cc=tj@kernel.org \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.