Re: [PATCH v2] Fix scheduler crash after s3 resume

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	"Keir (Xen.org)" <keir@xen.org>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2] Fix scheduler crash after s3 resume
Date: Thu, 24 Jan 2013 17:25:09 +0100	[thread overview]
Message-ID: <51016065.3080902@citrix.com> (raw)
In-Reply-To: <5101630D02000078000B93AD@nat28.tlf.novell.com>

On 24/01/13 16:36, Jan Beulich wrote:
>>>> On 24.01.13 at 15:26, Tomasz Wroblewski<tomasz.wroblewski@citrix.com>  wrote:
>>>>          
>> @@ -212,6 +213,8 @@
>>              BUG_ON(error == -EBUSY);
>>              printk("Error taking CPU%d up: %d\n", cpu, error);
>>          }
>> +        if (system_state == SYS_STATE_resume)
>> +            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
>>      
> This can't be right: What tells you that all CPUs were in pool 0?
>
>    
You're right, in my simple tests this was the case, but generally 
speaking it might not be.. Would an approach based on storing cpupool0 
mask in disable_nonboot_cpus() and restoring it in enable_nonboot_cpus() 
be more acceptable?
> Also, for the future - generating patches with -p helps quite
> a bit in reviewing them.
>
>    
Ok, thanks!
>> --- a/xen/common/schedule.c	Mon Jan 21 17:03:10 2013 +0000
>> +++ b/xen/common/schedule.c	Thu Jan 24 13:40:31 2013 +0000
>> @@ -545,7 +545,7 @@
>>      int    ret = 0;
>>
>>      c = per_cpu(cpupool, cpu);
>> -    if ( (c == NULL) || (system_state == SYS_STATE_suspend) )
>> +    if ( c == NULL )
>>          return ret;
>>
>>      for_each_domain_in_cpupool ( d, c )
>> @@ -556,7 +556,8 @@
>>
>>              cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid);
>>              if ( cpumask_empty(&online_affinity)&&
>> -                 cpumask_test_cpu(cpu, v->cpu_affinity) )
>> +                 cpumask_test_cpu(cpu, v->cpu_affinity)&&
>> +                 system_state != SYS_STATE_suspend )
>>              {
>>                  printk("Breaking vcpu affinity for domain %d vcpu %d\n",
>>                          v->domain->domain_id, v->vcpu_id);
>>      
> I doubt this is correct, as you don't restore any of the settings
> during resume that you tear down here.
>
>    
Is the objection about the affinity part or also the (c == NULL) bit? 
The cpu_disable_scheduler() function is currently part of a regular cpu 
down process, and was also part of suspend process before the "system 
state variable" changeset which regressed it. So the (c==NULL) hunk 
mostly just returns to previous state where this was working alot better 
(by empirical testing). But I am no expert on this, so would be grateful 
for ideas how this could be fixed in a better way!

Just to recap, the current problem boils down, I believe,  to the fact 
that vcpu_wake (schedule.c) function keeps getting called occasionally 
during the S3 path for cpus which have the per_cpu data freed, causing a 
crash. Safest way of fixing it seemed to be just put the suspend 
cpu_disable_scheduler under regular path again - it probably isn't the 
best..

next prev parent reply	other threads:[~2013-01-24 16:25 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-23 15:51 [PATCH] Fix scheduler crash after s3 resume Tomasz Wroblewski
2013-01-23 16:11 ` Jan Beulich
2013-01-23 16:57   ` Tomasz Wroblewski
2013-01-23 17:01     ` Tomasz Wroblewski
2013-01-23 17:50     ` Tomasz Wroblewski
2013-01-24  6:18 ` Juergen Gross
2013-01-24 14:26   ` [PATCH v2] " Tomasz Wroblewski
2013-01-24 15:36     ` Jan Beulich
2013-01-24 15:57       ` George Dunlap
2013-01-24 16:25       ` Tomasz Wroblewski [this message]
2013-01-24 16:56         ` Jan Beulich
2013-01-25  9:07           ` Tomasz Wroblewski
2013-01-25  9:36             ` Jan Beulich
2013-01-25  9:45               ` Tomasz Wroblewski
2013-01-25 10:15                 ` Jan Beulich
2013-01-25 10:18                   ` Tomasz Wroblewski
2013-01-25 10:29                     ` Jan Beulich
2013-01-25 10:23                   ` Juergen Gross
2013-01-25 10:29                     ` Tomasz Wroblewski
2013-01-25 10:31                     ` Jan Beulich
2013-01-25 10:35                       ` Juergen Gross
2013-01-25 10:40                         ` Jan Beulich
2013-01-25 11:05                           ` Juergen Gross
2013-01-25 11:56                         ` Tomasz Wroblewski
2013-01-25 12:27                           ` Jan Beulich
2013-01-25 13:58                             ` Tomasz Wroblewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51016065.3080902@citrix.com \
    --to=tomasz.wroblewski@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.