All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>,
	tigran@aivazian.fsnet.co.uk, tglx@linutronix.de, mingo@elte.hu,
	hpa@zytor.com, x86@kernel.org, linux-kernel@vger.kernel.org,
	Linux PM mailing list <linux-pm@lists.linux-foundation.org>
Subject: Re: [BUGFIX][PATCH] Freezer, CPU hotplug, x86 Microcode: Fix task freezing failures
Date: Wed, 05 Oct 2011 14:03:29 +0530	[thread overview]
Message-ID: <4E8C1659.6040208@linux.vnet.ibm.com> (raw)
In-Reply-To: <4E894D75.808@linux.vnet.ibm.com>

On 10/03/2011 11:21 AM, Srivatsa S. Bhat wrote:
> Hi,
> 
> First of all, thank you for your invaluable inputs.
> 
> On 10/03/2011 06:10 AM, Tejun Heo wrote:
>> Hello,
[...]
>>
>> Can't one take a cpu offline, hot unplug it from the board and put in
>> a new one and then bring it online?  That's usually what "hot
>> [un]plug" stands for.  I don't think one would be able to put in a
>> very different processor but maybe, say, revision difference is
>> allowed and microcode should be looked up again?  Assuming that the
>> same microcode can always be applied after the actual CPU is swapped
>> could be dangerous.  Again, I'm not sure about how this is supposed to
>> be managed so I could be wrong.
>>
[...]
>> If I'm not wrong, can't we add synchronization between cpu hotplug and
>> freezer so that they don't happen in parallel?
>>
> 
> I have posted another patch related to CPU hotplug and freezer here:
> https://lkml.org/lkml/2011/10/2/163
> That patch aims to sync up the CPU hotplug infrastructure with the activity of the 
> freezer and hence ensure that the right notifications are sent. Maybe I could easily
> modify that patch to disable CPU hotplugging when the freezer is active.
> 
> But still, that would not solve our problem due to the following possible scenario:
> 
> [Let us assume that we go ahead and invalidate the microcode upon a CPU_DEAD notification,
> like you suggested.] Then, consider this scenario:
> 
> * Take a CPU offline (a pure CPU hotplug operation, not related to suspend).
>   Let us call this CPU X.
>   - This would involve a CPU_DEAD notification, upon which the microcode is invalidated.
> 
> * Start freezing tasks (due to a suspend operation in progress). Now we disable
>   CPU hotplugging.
>  
> * Disable (offline) the non-boot CPUs as part of suspend operation. This sends the
> CPU_DEAD_FROZEN notification, upon which the microcode is NOT invalidated for the other CPUs.
> 
> * Enable (online) the non-boot CPUs as part of resume operation. Please note that the tasks
> are still frozen. This is where we hit the same issue again! When trying to bring that CPU X
> back online, it finds that the microcode has been invalidated and hence tries to request the
> userspace for the microcode, but alas, the userspace is still frozen at that moment.
> This is the exact problem we were trying to solve earlier, which remains unsolved by disabling
> CPU hotplug during freezer operations!
> 

Sorry, my understanding was wrong in this part. After digging some more into
the code, I found that disable_nonboot_cpus() will only offline the currently
online cpus, and enable_nonboot_cpus() will only online those cpus that were
offlined during the disable_nonboot_cpus() phase (which it keeps track by using
a frozen_cpus mask).

So the problem I mentioned above wouldn't arise because during resume,
enable_nonboot_cpus() wouldn't try to online CPU X.

I also found that both disable_nonboot_cpus() and enable_nonboot_cpus() will
disable cpu hotplugging when they are active.

So a suspend/resume on a high level with reference to only freezer and cpu
hotplug, would look something like:


Freeze -> Disable nonboot cpus -> do suspend -> Enable nonboot cpus -> Thaw
         |<------------ CPU hotplugging disabled ----------------->|


So now as we can see, if we just prevent freezer and cpu hotplug from occurring
in parallel (like what Tejun had suggested above), we could solve this whole
issue properly. We wouldn't need anything complicated such as special callbacks
etc which I was proposing in some of my other mails (which anyway wouldn't work
as it is, due to some race conditions that I figured out later)...

So if we go by the above solution, then the situation would look like:

Freeze -> Disable nonboot cpus -> do suspend -> Enable nonboot cpus -> Thaw
|<---------------------- CPU hotplugging disabled ----------------------->|


Let me summarize how this solution works:
1. Any CPU that was offline (either because it was never brought up during
   booting or because due to a cpu hotplug operation, we had offlined it)
   before suspend starts, will never be onlined (or even tried to online)
   till resume is complete. After resume, a suitable cpu hotplug operation
   would put such a cpu online, if necessary.
   So, it doesn't matter whether the kernel has already got the microcode
   for that cpu or not, because it will be onlined only when the resume is
   complete (ie., userspace is thawed).

2. Physical cpu hotplugging or any scenario which needs us to apply a revised
   microcode:
   Here, since we hotplug cpus (online or offline) only when the freezer is not
   active, we won't have any trouble getting microcode from userspace upon cpu
   online.


But then, will introducing this restriction that "freezer and cpu hotplug must
be mutually exclusive", create any problems? (I am asking this question because
I have heard that the freezer is used in some cases other than in suspend/resume
too). I don't seem to find anything obvious... So I guess that restriction
should be quite acceptable...
Any comments?

-- 
Regards,
Srivatsa S. Bhat  <srivatsa.bhat@linux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab

      parent reply	other threads:[~2011-10-05  8:33 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-02 19:05 [BUGFIX][PATCH] Freezer, CPU hotplug, x86 Microcode: Fix task freezing failures Srivatsa S. Bhat
2011-10-02 19:36 ` Rafael J. Wysocki
2011-10-02 19:50 ` Tejun Heo
2011-10-02 20:04   ` Srivatsa S. Bhat
2011-10-03  0:40     ` Tejun Heo
2011-10-03  5:51       ` Srivatsa S. Bhat
2011-10-03  8:47         ` Borislav Petkov
2011-10-04  7:15           ` Tejun Heo
2011-10-04 13:15             ` Srivatsa S. Bhat
2011-10-04 13:46               ` Borislav Petkov
2011-10-04 17:14                 ` Borislav Petkov
2011-10-04 19:49                   ` Rafael J. Wysocki
2011-10-04 20:57                   ` Srivatsa S. Bhat
2011-10-05  7:21                     ` Borislav Petkov
2011-10-05  8:51                       ` Srivatsa S. Bhat
2011-10-05 20:26                         ` Rafael J. Wysocki
2011-10-05 21:15                           ` Srivatsa S. Bhat
2011-10-05 22:43                             ` Rafael J. Wysocki
2011-10-06  6:50                               ` Srivatsa S. Bhat
2011-10-06  8:34                                 ` Borislav Petkov
2011-10-06 15:47                                   ` Srivatsa S. Bhat
2011-10-06 18:11                                     ` Srivatsa S. Bhat
2011-10-06 20:35                                     ` [BUGFIX][PATCH RESEND] " Srivatsa S. Bhat
2011-10-06 22:13                                       ` Tejun Heo
2011-10-06 22:34                                         ` Borislav Petkov
2011-10-07 16:48                                           ` Srivatsa S. Bhat
2011-10-07 18:05                                             ` Borislav Petkov
2011-10-04 13:25             ` [BUGFIX][PATCH] " Borislav Petkov
2011-10-05  8:33         ` Srivatsa S. Bhat [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E8C1659.6040208@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=mingo@elte.hu \
    --cc=rjw@sisk.pl \
    --cc=tglx@linutronix.de \
    --cc=tigran@aivazian.fsnet.co.uk \
    --cc=tj@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.