Re: [PATCH v2] Fix scheduler crash after s3 resume

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
To: Juergen Gross <juergen.gross@ts.fujitsu.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	"Keir (Xen.org)" <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2] Fix scheduler crash after s3 resume
Date: Fri, 25 Jan 2013 12:56:10 +0100	[thread overview]
Message-ID: <510272DA.5030003@citrix.com> (raw)
In-Reply-To: <51025FE5.1070200@ts.fujitsu.com>


> The crash happens due to an access to the scheduler percpu area which isn't
> allocated at the moment. The accessed element is the address of the scheduler
> lock for this cpu. Disabling the percpu locking scheme of the scheduler while
> the non-boot cpus are offline will avoid the crash.
>
>    
Ok, so I tried this approach (by turning the locking in vcpu_wake to be 
conditional based on system_state), and whilst it stopped vcpu_wake 
crash I traded it for a crash in acpi_cpufreq_target:

(XEN) ----[ Xen-4.3-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82c4801a0594>] acpi_cpufreq_target+0x165/0x33b
(XEN) RFLAGS: 0000000000010293   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830137bc7300   rcx: 0000000000000000
(XEN) rdx: 0000000000000009   rsi: ffff82c480265460   rdi: ffff830137bd7d60
(XEN) rbp: ffff830137bd7db0   rsp: ffff830137bd7d30   r8:  0000000000000004
(XEN) r9:  00000000fffffffe   r10: 0000000000000009   r11: 0000000000000000
(XEN) r12: ffff830137bc7c70   r13: ffff8301025444f8   r14: ffff830137bc7c70
(XEN) r15: 0000000001b5b14c   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 00000000ba674000   cr2: 000000000000004c
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff830137bd7d30:
(XEN)    000000008017d626 0000000000000009 00000009000000fb ffff830100000001
(XEN)    ffff830137bd7d60 0000080000000199 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffffffff37bd7da0 00000000ffffffea
(XEN)    ffff830137bc7c70 00000000002936c8 0000000006d9e30a 0000000001b5b14c
(XEN)    ffff830137bd7df0 ffff82c4801414ee ffff830137bc7c70 0000000000000003
(XEN)    ffff830137bd7df0 0000000000000008 0000000000000008 ffff830102ae1340
(XEN)    ffff830137bd7e50 ffff82c480140815 ffff8301141624a0 002936c800000286
(XEN)    ffff82c480308dc0 ffff830137bc7c70 0000000000000003 ffff830102ae1380
(XEN)    ffff830137bebb50 ffff830137bebc00 0000000000000010 0000000000000030
(XEN)    ffff830137bd7e70 ffff82c480140a2b ffff830137bd7e70 0000001548c205b8
(XEN)    ffff830137bd7ef0 ffff82c4801a31da 0000000000000002(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

(call graph sadly got eaten)

which corresponds to the following lines in cpufreq.c

     freqs.old = perf->states[perf->state].core_frequency * 1000;
     freqs.new = data->freq_table[next_state].frequency;
ffff82c4801a058d:       8b 55 94                mov    -0x6c(%rbp),%edx
ffff82c4801a0590:       48 8b 43 08             mov    0x8(%rbx),%rax
ffff82c4801a0594:       8b 44 d0 04             mov    0x4(%rax,%rdx,8),%eax
ffff82c4801a0598:       89 45 8c                mov    %eax,-0x74(%rbp)
ffff82c4801a059b:       48 c7 c0 00 80 ff ff    mov    $0xffffffffffff8000,%rax
ffff82c4801a05a2:       48 21 e0                and    %rsp,%rax

which I guess crashes because either freq_table or data is freed at this point (indeed seems that cpufreq driver has some cpu up/down logic which frees it). Given this is not even first place in acpi_freq_target this is accessed, it looks like the cpu got torn down halfway thru this function... Suspect there are likely to be more sites affected by this.

I also tried Jan's suggestion of making do_softirq skip its job if we are resuming, that causes a hang in rcu_barrier(), adding another resume conditional rcu_barrier() made it progress further but crash elsewhere (don't remember where exactly, this approach looked a bit like dead end so i abandoned it quickly)

So still not having a better solution than the revert of the cpu_disable_schedule() hunk.

next prev parent reply	other threads:[~2013-01-25 11:56 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-23 15:51 [PATCH] Fix scheduler crash after s3 resume Tomasz Wroblewski
2013-01-23 16:11 ` Jan Beulich
2013-01-23 16:57   ` Tomasz Wroblewski
2013-01-23 17:01     ` Tomasz Wroblewski
2013-01-23 17:50     ` Tomasz Wroblewski
2013-01-24  6:18 ` Juergen Gross
2013-01-24 14:26   ` [PATCH v2] " Tomasz Wroblewski
2013-01-24 15:36     ` Jan Beulich
2013-01-24 15:57       ` George Dunlap
2013-01-24 16:25       ` Tomasz Wroblewski
2013-01-24 16:56         ` Jan Beulich
2013-01-25  9:07           ` Tomasz Wroblewski
2013-01-25  9:36             ` Jan Beulich
2013-01-25  9:45               ` Tomasz Wroblewski
2013-01-25 10:15                 ` Jan Beulich
2013-01-25 10:18                   ` Tomasz Wroblewski
2013-01-25 10:29                     ` Jan Beulich
2013-01-25 10:23                   ` Juergen Gross
2013-01-25 10:29                     ` Tomasz Wroblewski
2013-01-25 10:31                     ` Jan Beulich
2013-01-25 10:35                       ` Juergen Gross
2013-01-25 10:40                         ` Jan Beulich
2013-01-25 11:05                           ` Juergen Gross
2013-01-25 11:56                         ` Tomasz Wroblewski [this message]
2013-01-25 12:27                           ` Jan Beulich
2013-01-25 13:58                             ` Tomasz Wroblewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=510272DA.5030003@citrix.com \
    --to=tomasz.wroblewski@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).