From: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
To: Juergen Gross <juergen.gross@ts.fujitsu.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
"Keir (Xen.org)" <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2] Fix scheduler crash after s3 resume
Date: Fri, 25 Jan 2013 12:56:10 +0100 [thread overview]
Message-ID: <510272DA.5030003@citrix.com> (raw)
In-Reply-To: <51025FE5.1070200@ts.fujitsu.com>
> The crash happens due to an access to the scheduler percpu area which isn't
> allocated at the moment. The accessed element is the address of the scheduler
> lock for this cpu. Disabling the percpu locking scheme of the scheduler while
> the non-boot cpus are offline will avoid the crash.
>
>
Ok, so I tried this approach (by turning the locking in vcpu_wake to be
conditional based on system_state), and whilst it stopped vcpu_wake
crash I traded it for a crash in acpi_cpufreq_target:
(XEN) ----[ Xen-4.3-unstable x86_64 debug=y Not tainted ]----
(XEN) CPU: 3
(XEN) RIP: e008:[<ffff82c4801a0594>] acpi_cpufreq_target+0x165/0x33b
(XEN) RFLAGS: 0000000000010293 CONTEXT: hypervisor
(XEN) rax: 0000000000000000 rbx: ffff830137bc7300 rcx: 0000000000000000
(XEN) rdx: 0000000000000009 rsi: ffff82c480265460 rdi: ffff830137bd7d60
(XEN) rbp: ffff830137bd7db0 rsp: ffff830137bd7d30 r8: 0000000000000004
(XEN) r9: 00000000fffffffe r10: 0000000000000009 r11: 0000000000000000
(XEN) r12: ffff830137bc7c70 r13: ffff8301025444f8 r14: ffff830137bc7c70
(XEN) r15: 0000000001b5b14c cr0: 000000008005003b cr4: 00000000000026f0
(XEN) cr3: 00000000ba674000 cr2: 000000000000004c
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff830137bd7d30:
(XEN) 000000008017d626 0000000000000009 00000009000000fb ffff830100000001
(XEN) ffff830137bd7d60 0000080000000199 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 ffffffff37bd7da0 00000000ffffffea
(XEN) ffff830137bc7c70 00000000002936c8 0000000006d9e30a 0000000001b5b14c
(XEN) ffff830137bd7df0 ffff82c4801414ee ffff830137bc7c70 0000000000000003
(XEN) ffff830137bd7df0 0000000000000008 0000000000000008 ffff830102ae1340
(XEN) ffff830137bd7e50 ffff82c480140815 ffff8301141624a0 002936c800000286
(XEN) ffff82c480308dc0 ffff830137bc7c70 0000000000000003 ffff830102ae1380
(XEN) ffff830137bebb50 ffff830137bebc00 0000000000000010 0000000000000030
(XEN) ffff830137bd7e70 ffff82c480140a2b ffff830137bd7e70 0000001548c205b8
(XEN) ffff830137bd7ef0 ffff82c4801a31da 0000000000000002(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
(call graph sadly got eaten)
which corresponds to the following lines in cpufreq.c
freqs.old = perf->states[perf->state].core_frequency * 1000;
freqs.new = data->freq_table[next_state].frequency;
ffff82c4801a058d: 8b 55 94 mov -0x6c(%rbp),%edx
ffff82c4801a0590: 48 8b 43 08 mov 0x8(%rbx),%rax
ffff82c4801a0594: 8b 44 d0 04 mov 0x4(%rax,%rdx,8),%eax
ffff82c4801a0598: 89 45 8c mov %eax,-0x74(%rbp)
ffff82c4801a059b: 48 c7 c0 00 80 ff ff mov $0xffffffffffff8000,%rax
ffff82c4801a05a2: 48 21 e0 and %rsp,%rax
which I guess crashes because either freq_table or data is freed at this point (indeed seems that cpufreq driver has some cpu up/down logic which frees it). Given this is not even first place in acpi_freq_target this is accessed, it looks like the cpu got torn down halfway thru this function... Suspect there are likely to be more sites affected by this.
I also tried Jan's suggestion of making do_softirq skip its job if we are resuming, that causes a hang in rcu_barrier(), adding another resume conditional rcu_barrier() made it progress further but crash elsewhere (don't remember where exactly, this approach looked a bit like dead end so i abandoned it quickly)
So still not having a better solution than the revert of the cpu_disable_schedule() hunk.
next prev parent reply other threads:[~2013-01-25 11:56 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-23 15:51 [PATCH] Fix scheduler crash after s3 resume Tomasz Wroblewski
2013-01-23 16:11 ` Jan Beulich
2013-01-23 16:57 ` Tomasz Wroblewski
2013-01-23 17:01 ` Tomasz Wroblewski
2013-01-23 17:50 ` Tomasz Wroblewski
2013-01-24 6:18 ` Juergen Gross
2013-01-24 14:26 ` [PATCH v2] " Tomasz Wroblewski
2013-01-24 15:36 ` Jan Beulich
2013-01-24 15:57 ` George Dunlap
2013-01-24 16:25 ` Tomasz Wroblewski
2013-01-24 16:56 ` Jan Beulich
2013-01-25 9:07 ` Tomasz Wroblewski
2013-01-25 9:36 ` Jan Beulich
2013-01-25 9:45 ` Tomasz Wroblewski
2013-01-25 10:15 ` Jan Beulich
2013-01-25 10:18 ` Tomasz Wroblewski
2013-01-25 10:29 ` Jan Beulich
2013-01-25 10:23 ` Juergen Gross
2013-01-25 10:29 ` Tomasz Wroblewski
2013-01-25 10:31 ` Jan Beulich
2013-01-25 10:35 ` Juergen Gross
2013-01-25 10:40 ` Jan Beulich
2013-01-25 11:05 ` Juergen Gross
2013-01-25 11:56 ` Tomasz Wroblewski [this message]
2013-01-25 12:27 ` Jan Beulich
2013-01-25 13:58 ` Tomasz Wroblewski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=510272DA.5030003@citrix.com \
--to=tomasz.wroblewski@citrix.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=juergen.gross@ts.fujitsu.com \
--cc=keir@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).