Re: [PATCH v2] Fix scheduler crash after s3 resume

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	"Keir (Xen.org)" <keir@xen.org>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2] Fix scheduler crash after s3 resume
Date: Thu, 24 Jan 2013 17:25:09 +0100	[thread overview]
Message-ID: <51016065.3080902@citrix.com> (raw)
In-Reply-To: <5101630D02000078000B93AD@nat28.tlf.novell.com>

On 24/01/13 16:36, Jan Beulich wrote:
>>>> On 24.01.13 at 15:26, Tomasz Wroblewski<tomasz.wroblewski@citrix.com>  wrote:
>>>>          
>> @@ -212,6 +213,8 @@
>>              BUG_ON(error == -EBUSY);
>>              printk("Error taking CPU%d up: %d\n", cpu, error);
>>          }
>> +        if (system_state == SYS_STATE_resume)
>> +            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
>>      
> This can't be right: What tells you that all CPUs were in pool 0?
>
>    
You're right, in my simple tests this was the case, but generally 
speaking it might not be.. Would an approach based on storing cpupool0 
mask in disable_nonboot_cpus() and restoring it in enable_nonboot_cpus() 
be more acceptable?
> Also, for the future - generating patches with -p helps quite
> a bit in reviewing them.
>
>    
Ok, thanks!
>> --- a/xen/common/schedule.c	Mon Jan 21 17:03:10 2013 +0000
>> +++ b/xen/common/schedule.c	Thu Jan 24 13:40:31 2013 +0000
>> @@ -545,7 +545,7 @@
>>      int    ret = 0;
>>
>>      c = per_cpu(cpupool, cpu);
>> -    if ( (c == NULL) || (system_state == SYS_STATE_suspend) )
>> +    if ( c == NULL )
>>          return ret;
>>
>>      for_each_domain_in_cpupool ( d, c )
>> @@ -556,7 +556,8 @@
>>
>>              cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid);
>>              if ( cpumask_empty(&online_affinity)&&
>> -                 cpumask_test_cpu(cpu, v->cpu_affinity) )
>> +                 cpumask_test_cpu(cpu, v->cpu_affinity)&&
>> +                 system_state != SYS_STATE_suspend )
>>              {
>>                  printk("Breaking vcpu affinity for domain %d vcpu %d\n",
>>                          v->domain->domain_id, v->vcpu_id);
>>      
> I doubt this is correct, as you don't restore any of the settings
> during resume that you tear down here.
>
>    
Is the objection about the affinity part or also the (c == NULL) bit? 
The cpu_disable_scheduler() function is currently part of a regular cpu 
down process, and was also part of suspend process before the "system 
state variable" changeset which regressed it. So the (c==NULL) hunk 
mostly just returns to previous state where this was working alot better 
(by empirical testing). But I am no expert on this, so would be grateful 
for ideas how this could be fixed in a better way!

Just to recap, the current problem boils down, I believe,  to the fact 
that vcpu_wake (schedule.c) function keeps getting called occasionally 
during the S3 path for cpus which have the per_cpu data freed, causing a 
crash. Safest way of fixing it seemed to be just put the suspend 
cpu_disable_scheduler under regular path again - it probably isn't the 
best..

next prev parent reply	other threads:[~2013-01-24 16:25 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-23 15:51 [PATCH] Fix scheduler crash after s3 resume Tomasz Wroblewski
2013-01-23 16:11 ` Jan Beulich
2013-01-23 16:57   ` Tomasz Wroblewski
2013-01-23 17:01     ` Tomasz Wroblewski
2013-01-23 17:50     ` Tomasz Wroblewski
2013-01-24  6:18 ` Juergen Gross
2013-01-24 14:26   ` [PATCH v2] " Tomasz Wroblewski
2013-01-24 15:36     ` Jan Beulich
2013-01-24 15:57       ` George Dunlap
2013-01-24 16:25       ` Tomasz Wroblewski [this message]
2013-01-24 16:56         ` Jan Beulich
2013-01-25  9:07           ` Tomasz Wroblewski
2013-01-25  9:36             ` Jan Beulich
2013-01-25  9:45               ` Tomasz Wroblewski
2013-01-25 10:15                 ` Jan Beulich
2013-01-25 10:18                   ` Tomasz Wroblewski
2013-01-25 10:29                     ` Jan Beulich
2013-01-25 10:23                   ` Juergen Gross
2013-01-25 10:29                     ` Tomasz Wroblewski
2013-01-25 10:31                     ` Jan Beulich
2013-01-25 10:35                       ` Juergen Gross
2013-01-25 10:40                         ` Jan Beulich
2013-01-25 11:05                           ` Juergen Gross
2013-01-25 11:56                         ` Tomasz Wroblewski
2013-01-25 12:27                           ` Jan Beulich
2013-01-25 13:58                             ` Tomasz Wroblewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51016065.3080902@citrix.com \
    --to=tomasz.wroblewski@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).