All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
To: Juergen Gross <juergen.gross@ts.fujitsu.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	"Keir (Xen.org)" <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: [PATCH v2] Fix scheduler crash after s3 resume
Date: Thu, 24 Jan 2013 15:26:43 +0100	[thread overview]
Message-ID: <510144A3.9060302@citrix.com> (raw)
In-Reply-To: <5100D229.4030906@ts.fujitsu.com>

[-- Attachment #1: Type: text/plain, Size: 2336 bytes --]

Hi, had to made another version of this patch which also fixes the 
additional interminent crash in vcpu_wake(). Would be grateful for 
comments / possible ack again.

This crash was happening when vcpu_wake is called between 
disable_nonboot_cpus() and enable_nonboot_cpus() on s3 path (happens for 
example from the insides of rcu_barrier() sometime). It turns out that 
it was due to vcpu_schedule_lock() accessing per_cpu area, which however 
is freed at this point as it's freed in percpu.c cpu down callback.

I tried the approach of preserving per cpu area on cpu down/up (during 
s3), as well as testing for cpu being online in vcpu_wake before 
acquiring this lock, but ultimately although helping a bit these were 
not fully succesful. So concluded it's probably not really correct to 
let the scheduler run rampart during the disable_nonboot_cpus() / 
enable_nonboot_cpus() window on s3 path and made a new patch version. 
Tested it across many s3 iterations (on lenovo T520), with no problems. 
It should be pretty uninvasive as it only touches S3 path.

Changes from v1:
- modified cpu_disable_scheduler (schedule.c) to run most of stop 
scheduler logic again (i.e. the vcpu migrate). This is partial revert of 
c-s 25079:d5ccb2d1dbd1 . However, breaking of domain vcpu affinities 
seems to be avoidable on this path, so added a condition to do just that

- instead of skipping cpupool0->is_valid cpumask clear on suspend path, 
added restore of that bit on resume path. Moving this was needed because 
otherwise cpu_disable_scheduler() fails to migrate the vcpu (as it's 
still in cpu_valid mask when it's attempted), return EAGAIN and BUG() 
will fire in __cpu_disable()

Signed-off-by: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>

Commit message:
Fix S3 resume regression after C-S 25079:d5ccb2d1dbd1. Regression causes 
either an interminent crash in vcpu_wake (attempt to access 
vcpu_schedule_lock which is in freed per cpu area at this point), or, in 
debug xen, more frequent ASSERT(!cpumask_empty(&cpus)...) firing in 
_csched_cpu_pick.

Fix this by reverting the hunk which turned off disabling cpu scheduler 
on suspend path. Additionally, avoid breaking domain vcpu affinities on 
suspend path. On resume, restore the frozen cpus in cpupool's cpu_valid 
mask, so they can once again be used by scheduler.


[-- Attachment #2: fix-suspend-scheduler-v2 --]
[-- Type: text/plain, Size: 1512 bytes --]

diff -r 4b476378fc35 -r b7fa7fdaad59 xen/common/cpu.c
--- a/xen/common/cpu.c	Mon Jan 21 17:03:10 2013 +0000
+++ b/xen/common/cpu.c	Thu Jan 24 13:40:31 2013 +0000
@@ -5,6 +5,7 @@
 #include <xen/init.h>
 #include <xen/sched.h>
 #include <xen/stop_machine.h>
+#include <xen/sched-if.h>
 
 unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
 #ifndef nr_cpumask_bits
@@ -212,6 +213,8 @@
             BUG_ON(error == -EBUSY);
             printk("Error taking CPU%d up: %d\n", cpu, error);
         }
+        if (system_state == SYS_STATE_resume)
+            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
     }
 
     cpumask_clear(&frozen_cpus);
diff -r 4b476378fc35 -r b7fa7fdaad59 xen/common/schedule.c
--- a/xen/common/schedule.c	Mon Jan 21 17:03:10 2013 +0000
+++ b/xen/common/schedule.c	Thu Jan 24 13:40:31 2013 +0000
@@ -545,7 +545,7 @@
     int    ret = 0;
 
     c = per_cpu(cpupool, cpu);
-    if ( (c == NULL) || (system_state == SYS_STATE_suspend) )
+    if ( c == NULL )
         return ret;
 
     for_each_domain_in_cpupool ( d, c )
@@ -556,7 +556,8 @@
 
             cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
-                 cpumask_test_cpu(cpu, v->cpu_affinity) )
+                 cpumask_test_cpu(cpu, v->cpu_affinity) &&
+                 system_state != SYS_STATE_suspend )
             {
                 printk("Breaking vcpu affinity for domain %d vcpu %d\n",
                         v->domain->domain_id, v->vcpu_id);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2013-01-24 14:26 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-23 15:51 [PATCH] Fix scheduler crash after s3 resume Tomasz Wroblewski
2013-01-23 16:11 ` Jan Beulich
2013-01-23 16:57   ` Tomasz Wroblewski
2013-01-23 17:01     ` Tomasz Wroblewski
2013-01-23 17:50     ` Tomasz Wroblewski
2013-01-24  6:18 ` Juergen Gross
2013-01-24 14:26   ` Tomasz Wroblewski [this message]
2013-01-24 15:36     ` [PATCH v2] " Jan Beulich
2013-01-24 15:57       ` George Dunlap
2013-01-24 16:25       ` Tomasz Wroblewski
2013-01-24 16:56         ` Jan Beulich
2013-01-25  9:07           ` Tomasz Wroblewski
2013-01-25  9:36             ` Jan Beulich
2013-01-25  9:45               ` Tomasz Wroblewski
2013-01-25 10:15                 ` Jan Beulich
2013-01-25 10:18                   ` Tomasz Wroblewski
2013-01-25 10:29                     ` Jan Beulich
2013-01-25 10:23                   ` Juergen Gross
2013-01-25 10:29                     ` Tomasz Wroblewski
2013-01-25 10:31                     ` Jan Beulich
2013-01-25 10:35                       ` Juergen Gross
2013-01-25 10:40                         ` Jan Beulich
2013-01-25 11:05                           ` Juergen Gross
2013-01-25 11:56                         ` Tomasz Wroblewski
2013-01-25 12:27                           ` Jan Beulich
2013-01-25 13:58                             ` Tomasz Wroblewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=510144A3.9060302@citrix.com \
    --to=tomasz.wroblewski@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.