From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH 2 of 2 V3] x86/kexec: Change NMI and MCE
 handling on kexec path
Date: Mon, 10 Dec 2012 12:54:36 +0000
Message-ID: <50C5DB8C.4050601@citrix.com>
References: <patchbomb.1354830148@andrewcoop.uk.xensource.com>
	<3757511a785287066cfd.1354830150@andrewcoop.uk.xensource.com>
	<50C1E67F02000078000AEEEF@nat28.tlf.novell.com>
	<50C25D3D.10407@citrix.com>
	<50C5BA2A02000078000AF4F4@nat28.tlf.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <50C5BA2A02000078000AF4F4@nat28.tlf.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 10/12/12 09:32, Jan Beulich wrote:
>>>> On 07.12.12 at 22:18, Andrew Cooper<andrew.cooper3@citrix.com>  wrote:
>> On 07/12/2012 11:52, Jan Beulich wrote:
>>>>>> On 06.12.12 at 22:42, Andrew Cooper<andrew.cooper3@citrix.com>  wrote:
>>>> +
>>>> +    /* Change NMI trap handlers.  Non-crashing pcpus get nmi_crash which
>>>> +     * invokes do_nmi_crash (above), which cause them to write state and
>>>> +     * fall into a loop.  The crashing pcpu gets the nop handler to
>>>> +     * cause it to return to this function ASAP.
>>>> +     */
>>>> +    for ( i = 0; i<  nr_cpu_ids; ++i )
>>>> +        if ( idt_tables[i] )
>>>> +        {
>>>> +
>>>> +            if ( i == cpu )
>>>> +            {
>>>> +                /* Disable the interrupt stack tables for this MCE and
>>>> +                 * NMI handler (shortly to become a nop) as there is a 1
>>>> +                 * instruction race window where NMIs could be
>>>> +                 * re-enabled and corrupt the exception frame, leaving
>>>> +                 * us unable to continue on this crash path (which half
>>>> +                 * defeats the point of using the nop handler in the
>>>> +                 * first place).
>>>> +                 *
>>>> +                 * This update is safe from a security point of view, as
>>>> +                 * this pcpu is never going to try to sysret back to a
>>>> +                 * PV vcpu.
>>>> +                 */
>>>> +                set_ist(&idt_tables[i][TRAP_nmi],           IST_NONE);
>>>> +                set_ist(&idt_tables[i][TRAP_machine_check], IST_NONE);
>>>> +
>>>> +                _set_gate(&idt_tables[i][TRAP_nmi], 14, 0,&trap_nop);
>>> This makes the first set_ist() above pointless, doesn't it?
>> No.  The set_ist() is to remove the possibility of stack corruption from
>> reentrant NMIs, while the trap_nop handler is so we don't get diverted
>> into the regular NMI handler.  There is nothing the NMI handler would do
>> which could alter the outcome, and there are many cases where the
>> regular NMI handler would try to panic, starting us reentrantly on the
>> kexec path again (where we would deadlock on the one_cpu_only() check).
> I think you didn't get the point of the question: _set_gate() clears
> the IST field of the descriptor anyway, so why clear it separately
> first, and then overwrite it again?

Ah - my apologies.  I indeed was not understanding the point.

I will need to fix up the calls then.  Having _set_gate() change the IST 
as well reintroduces the security vulnerability.  I will create a new 
function similar to _set_gate() which only changes the handler and 
nothing else.

>
>>>> +            }
>>>> +            else
>>>> +                _set_gate(&idt_tables[i][TRAP_nmi], 14, 0,&nmi_crash);
>>>> +        }
>>>> +
>>>>       /* Ensure the new callback function is set before sending out the NMI.
>> */
>>>>       wmb();
>>>>   ...
>>>> +/* Enable NMIs.  No special register assumptions, and all preserved. */
>>>> +ENTRY(enable_nmis)
>>>> +        pushq %rax
>>> What's the point of saving %rax here, btw?
>>>
>>> Jan
>> Because at the moment I believe I might need to call it from asm
>> context, when doing some of the later fixes.  I figured that it was
>> better to make it safe now, rather than patch it up later.
> I don't think that's good practice - if you end up not calling the
> thing from assembly code, the question on the purpose of saving
> %rax will re-surface sooner or later. Plus the patch by itself
> wouldn't really explain either why this is so (which might be
> of interest in the context of backporting it to older trees).
>
> Jan
>

Ok - will remove.

~Andrew