From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
Dario Faggioli <dario.faggioli@citrix.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
Xen-devel List <xen-devel@lists.xen.org>
Subject: Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in 'auto' mode"
Date: Tue, 3 Dec 2013 19:53:58 +0000 [thread overview]
Message-ID: <529E36D6.1010509@citrix.com> (raw)
In-Reply-To: <529CA8F00200007800108CDA@nat28.tlf.novell.com>
On 02/12/13 14:36, Jan Beulich wrote:
>>>> On 02.12.13 at 15:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> After some more investigation, this is not a regression at all, although
>> the patch is directly relevant to identifying the problem.
>>
>> PXELINUX 4.04 2011-04-18 Copyright (C) 1994-2011 H. Peter Anvin et al
>> boot:
>> Loading xenrt/xen-minnow.gz... ok
>> Loading xenrt/vmlinuz... ok
>> After multiboot magic check
>> Opcode from 0x105fef: 97 0e 00 00 49 8d be b0
>> Before lret into trampoline
>> Opcode from 0x105fef: 97 0e 00 00 49 8d be b0
>> After (failed) conditional jmp to start_secondary
>> Opcode from 0xffff830000105fef: 97 0e 86 00 49 8d be b0
>> __ __ _ _ _____ _
>> \ \/ /___ _ __ | || | |___ / / |
>> \ // _ \ '_ \ | || |_ |_ \ | |
>>
>>
>> Something between entering the trampoline and emerging in 64bit mode is
>> corrupting a single byte at phys 0x105ff1 from its correct value to a
>> value of 0x86.
>>
>> The corruption disappears if the "no-real-mode" is used.
> And I'd say the primary suspect is
>
> /*
> * Declare that our target operating mode is long mode.
> * Initialise 32-bit registers since some buggy BIOSes depend on it.
> */
> movl $0xec00,%eax # declare target operating mode
> movl $0x0002,%ebx # long mode
> int $0x15
>
> considering that 0x86 is a relatively common "function not
> implemented" indicator for BIOS, namely INT 15, functions.
>
> As a possible workaround I'd consider trying
> a) zeroing %esp rather than just %sp a few lines up from the
> above quoted code
> b) zeroing the high halves of all registers
>
> Jan
>
Your suspicion would be entirely correct. I have positively identified
this `int $0x15` call as corrupting the memory. The byte is fine
immediately before and bad immediately afterwards.
I have further confirmed that zeroing all 32bits of the GPRs before
entering the interrupt fixes the issue.
In an attempt to understand what is going on, I stuck in more debugging
for the entire register/selector state before and after, to see whether
anything looked like a smoking gun.
(XEN) Pre-state:
(XEN) eax 00007600 ebx 00000000 ecx 00000000 edx 00007600
(XEN) esi 0028b0c4 edi 00078a80 esp 00080000 ebp 00000000
(XEN) cs 7600 ds 7600 es 7600 fs 0028 gs 0028 ss 7600
If the GPRs are left as are the post state looks like:
(XEN) Post-state:
(XEN) eax 00008600 ebx 00000000 ecx 00000000 edx 00007600
(XEN) esi 0028b0c4 edi 00078a70 esp 00080000 ebp 00000000
(XEN) cs 7600 ds 7600 es 7600 fs 0028 gs 0028 ss 7600
If the GPRs are zeroed as much as possible, the post state looks like:
(XEN) Post-state:
(XEN) eax 00008600 ebx 00000000 ecx 00000000 edx 00000000
(XEN) esi 00000000 edi 00000000 esp 00000000 ebp 00000000
(XEN) cs 7600 ds 7600 es 7600 fs 0028 gs 0028 ss 7600
In both cases, the carry flag is set, which is consistent with the
return value of 0x86 is %ah.
I iterated through the registers, and proved that it was esp
specifically which was the problem.
I shall submit a patch against trampoline.S shortly.
~Andrew
prev parent reply other threads:[~2013-12-03 19:53 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-28 12:31 Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in 'auto' mode" Andrew Cooper
2013-11-28 13:05 ` Dario Faggioli
2013-11-28 15:09 ` George Dunlap
2013-11-28 15:14 ` Dario Faggioli
2013-11-28 15:16 ` Andrew Cooper
2013-11-28 21:17 ` Andrew Cooper
2013-11-28 23:30 ` George Dunlap
2013-11-29 10:51 ` Ian Campbell
2013-11-29 11:04 ` Andrew Cooper
2013-12-02 14:01 ` Andrew Cooper
2013-12-02 14:36 ` Jan Beulich
2013-12-03 19:53 ` Andrew Cooper [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=529E36D6.1010509@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=JBeulich@suse.com \
--cc=dario.faggioli@citrix.com \
--cc=george.dunlap@eu.citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).