All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>,
	linux-kernel@vger.kernel.org,
	Linux PM mailing list <linux-pm@lists.linux-foundation.org>,
	oleg@redhat.com, arnd@arndb.de,
	Christoph Lameter <cl@linux-foundation.org>,
	Pekka Enberg <penberg@kernel.org>
Subject: Re: [BUG] CPU hotplug, freezer: Freezing of tasks failed after 20.00 seconds
Date: Mon, 03 Oct 2011 00:43:42 +0530	[thread overview]
Message-ID: <4E88B7E6.7080402@linux.vnet.ibm.com> (raw)
In-Reply-To: <201109060801.09210.rjw@sisk.pl>

On 09/06/2011 11:31 AM, Rafael J. Wysocki wrote:
> On Tuesday, September 06, 2011, Tejun Heo wrote:
>> Hello, again.
>>
>> On Mon, Sep 05, 2011 at 11:15:12PM +0900, Tejun Heo wrote:
>>>>  Freezing of tasks failed after 20.01 seconds (2 tasks refusing to freeze, wq_busy=0):
>>>>  invert_cpu_stat D 0000000000000000  5304 20435  17329 0x00000084
>>>>   ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
>>>>   ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
>>>>   ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
>>>>  Call Trace:
>>>>   [<ffffffff81532de5>] schedule_timeout+0x235/0x320
>>>>   [<ffffffff81532a0b>] wait_for_common+0x11b/0x170
>>>>   [<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
>>>>   [<ffffffff81364486>] _request_firmware+0x156/0x2c0
>>>>   [<ffffffff81364686>] request_firmware+0x16/0x20
>>>>   [<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
>>>>   [<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
>>>>   [<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
>>>>   [<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
>>>>   [<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
>>>>   [<ffffffff8106d000>] __cpu_notify+0x20/0x40
>>>>   [<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
>>>>   [<ffffffff8152d07b>] cpu_up+0xd9/0xec
>>>>   [<ffffffff8151e599>] store_online+0x99/0xd0
>>>>   [<ffffffff81355eb0>] sysdev_store+0x20/0x30
>>>>   [<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
>>>>   [<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
>>>>   [<ffffffff8117f024>] sys_write+0x54/0xa0
>>>>   [<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
>>>
>>> So, this task is trying to bring a CPU up, which triggers firmware
>>> helper to load microcode.  Firmware class currently sleeps
>>> non-interruptibly to wait for firmware load to complete, which is
>>> performed by another userland task.  Now, the PM freezer doesn't
>>> assume that there will be non-freezable wait dependencies among
>>> userland tasks.  It only knows two levels - userland and kernel tasks
>>> - and assumes that the former group may have non-freezable wait
>>> dependency on the latter but there's no such dependency among each
>>> group itself.  If there's such dependency, PM freezer may fail, which
>>> is what happened here.
>>>
>>> ie. the firmware loader userland process got frozen first.
>>> invert_cpu_stat trying to bring up CPU was waiting for the firmware
>>> loader to finish in non-interruptible sleep, so the freezer couldn't
>>> proceed.
>>
>> Hmmm... I went through the code again and usermodehelper_disable()
>> seems to be there to prevent deadlocks like this.  usermode helpers
>> are drained & plugged before freezing is tried.  Rafael, the above
>> shouldn't be happening, right?
> 
> No, it shouldn't in theory, but I'm not sure any more after the recent
> modifications of firmware loading related to the initialization.  I'll have
> a closer look tomorrow.
> 

Hi,
I have posted a fix for this bug at https://lkml.org/lkml/2011/10/2/142
With my fix, the numerous "WARNING"s at drivers/base/firmware_class.c
disappear and the task freezing failures are fixed too.
I have tested this for about 10-12 hours (much more time than what was
necessary to reproduce the bug earlier).

-- 
Regards,
Srivatsa S. Bhat  <srivatsa.bhat@linux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab


  reply	other threads:[~2011-10-02 19:15 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-19 14:16 [PATCHSET] freezer: fix various bugs and simplify implementation Tejun Heo
2011-08-19 14:16 ` [PATCH 01/16] freezer: fix current->state restoration race in refrigerator() Tejun Heo
2011-08-19 15:52   ` Oleg Nesterov
2011-08-19 16:11     ` Tejun Heo
2011-08-19 21:08   ` Rafael J. Wysocki
2011-08-20  8:13     ` Tejun Heo
2011-08-19 14:16 ` [PATCH 02/16] freezer: don't unnecessarily set PF_NOFREEZE explicitly Tejun Heo
2011-08-19 16:43   ` Gustavo Padovan
2011-08-22 15:05   ` Samuel Ortiz
2011-08-19 14:16 ` [PATCH 03/16] freezer: unexport refrigerator() and update try_to_freeze() slightly Tejun Heo
2011-08-19 14:16 ` [PATCH 04/16] freezer: implement and use kthread_freezable_should_stop() Tejun Heo
2011-08-19 20:07   ` Henrique de Moraes Holschuh
2011-08-21 19:14   ` Oleg Nesterov
2011-08-22  9:53     ` Tejun Heo
2011-08-23 15:42       ` Oleg Nesterov
2011-08-19 14:16 ` [PATCH 05/16] freezer: rename thaw_process() to __thaw_task() and simplify the implementation Tejun Heo
2011-08-19 15:37   ` Paul Menage
2011-08-24  2:28   ` Matt Helsley
2011-08-19 14:16 ` [PATCH 06/16] freezer: make exiting tasks properly unfreezable Tejun Heo
2011-08-23 15:52   ` Oleg Nesterov
2011-08-23 19:44     ` Tejun Heo
2011-08-24 14:14       ` Oleg Nesterov
2011-08-25 15:59         ` Tejun Heo
2011-08-25 16:56           ` Oleg Nesterov
2011-08-25 21:01             ` Rafael J. Wysocki
2011-08-25 21:54               ` Tejun Heo
2011-08-26 21:09                 ` Rafael J. Wysocki
2011-08-27 10:35                   ` Tejun Heo
2011-08-27 10:51                     ` Rafael J. Wysocki
2011-08-27 11:02                       ` Tejun Heo
2011-08-27 12:22                         ` Rafael J. Wysocki
2011-08-25 21:52             ` Tejun Heo
2011-08-24 22:34   ` Matt Helsley
2011-08-25 15:25     ` Oleg Nesterov
2011-08-25 16:11     ` Tejun Heo
2011-08-19 14:16 ` [PATCH 07/16] freezer: don't distinguish nosig tasks on thaw Tejun Heo
2011-08-19 21:14   ` Rafael J. Wysocki
2011-08-20  8:10     ` Tejun Heo
2011-08-20  8:10     ` Tejun Heo
2011-08-20  8:39       ` Rafael J. Wysocki
2011-08-20  8:39       ` Rafael J. Wysocki
2011-08-19 21:14   ` Rafael J. Wysocki
2011-08-19 14:16 ` [PATCH 08/16] freezer: use dedicated lock instead of task_lock() + memory barrier Tejun Heo
2011-08-28 17:51   ` Oleg Nesterov
2011-08-28 18:21     ` Oleg Nesterov
2011-08-29  7:20     ` Tejun Heo
2011-08-19 14:16 ` [PATCH 09/16] freezer: make freezing indicate freeze condition in effect Tejun Heo
2011-08-28 17:56   ` Oleg Nesterov
2011-08-29  7:31     ` Tejun Heo
2011-08-29 17:44     ` Oleg Nesterov
2011-08-19 14:16 ` [PATCH 10/16] freezer: fix set_freezable[_with_signal]() race Tejun Heo
2011-08-28 18:01   ` Oleg Nesterov
2011-08-29  7:38     ` Tejun Heo
2011-08-19 14:16 ` [PATCH 11/16] freezer: kill PF_FREEZING Tejun Heo
2011-08-19 14:16 ` [PATCH 12/16] freezer: clean up freeze_processes() failure path Tejun Heo
2011-08-28 18:09   ` Oleg Nesterov
2011-08-29  7:28     ` Tejun Heo
2011-08-29  7:40       ` Rafael J. Wysocki
2011-08-19 14:16 ` [PATCH 13/16] cgroup_freezer: prepare for removal of TIF_FREEZE Tejun Heo
2011-08-19 15:40   ` Paul Menage
2011-08-28 17:39   ` Oleg Nesterov
2011-08-29  6:30     ` Tejun Heo
2011-08-19 14:16 ` [PATCH 14/16] freezer: make freezing() test freeze conditions in effect instead " Tejun Heo
2011-08-19 15:43   ` Paul Menage
2011-08-29 15:49   ` Oleg Nesterov
2011-08-29 15:56     ` Oleg Nesterov
2011-08-29 16:30       ` Oleg Nesterov
2011-08-29 16:17     ` Oleg Nesterov
2011-08-19 14:16 ` [PATCH 15/16] freezer: remove now unused TIF_FREEZE Tejun Heo
2011-08-19 14:16 ` [PATCH 16/16] freezer: remove should_send_signal() and update frozen() Tejun Heo
2011-08-19 14:23 ` [PATCHSET] freezer: fix various bugs and simplify implementation Tejun Heo
2011-08-19 15:34   ` Paul Menage
2011-08-19 16:25   ` Tejun Heo
2011-08-24  1:10     ` Matt Helsley
2011-08-19 21:00 ` Rafael J. Wysocki
2011-08-20  8:14   ` Tejun Heo
2011-08-20  8:14   ` Tejun Heo
2011-09-05  6:49   ` [BUG] CPU hotplug, freezer: Freezing of tasks failed after 20.00 seconds Srivatsa S. Bhat
2011-09-05  8:52   ` Srivatsa S. Bhat
2011-09-05 14:15     ` Tejun Heo
2011-09-06  5:08       ` Tejun Heo
2011-09-06  5:08       ` Tejun Heo
2011-09-06  6:01         ` Rafael J. Wysocki
2011-10-02 19:13           ` Srivatsa S. Bhat [this message]
2011-10-02 19:33             ` Rafael J. Wysocki
2011-09-06  6:01         ` Rafael J. Wysocki
2011-09-05 14:15     ` Tejun Heo
2011-09-05  8:52   ` Srivatsa S. Bhat
2011-08-19 21:00 ` [PATCHSET] freezer: fix various bugs and simplify implementation Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E88B7E6.7080402@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=arnd@arndb.de \
    --cc=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=oleg@redhat.com \
    --cc=penberg@kernel.org \
    --cc=rjw@sisk.pl \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.