From mboxrd@z Thu Jan  1 00:00:00 1970
From: Radim =?utf-8?B?S3LEjW3DocWZ?= <rkrcmar@redhat.com>
Subject: Re: [PATCH 08/12] KVM: x86: save/load state on SMM switch
Date: Fri, 22 May 2015 16:17:13 +0200
Message-ID: <20150522141713.GC31183@potion.brq.redhat.com>
References: <1431084034-8425-1-git-send-email-pbonzini@redhat.com>
 <1431084034-8425-9-git-send-email-pbonzini@redhat.com>
 <20150521162036.GA31183@potion.brq.redhat.com>
 <555E0683.6020600@redhat.com>
 <20150521170014.GB31171@potion.brq.redhat.com>
 <555E4C4E.1010603@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, bsd@redhat.com
To: Paolo Bonzini <pbonzini@redhat.com>
Return-path: <linux-kernel-owner@vger.kernel.org>
Content-Disposition: inline
In-Reply-To: <555E4C4E.1010603@redhat.com>
Sender: linux-kernel-owner@vger.kernel.org
List-Id: kvm.vger.kernel.org

2015-05-21 23:21+0200, Paolo Bonzini:
> On 21/05/2015 19:00, Radim Kr=C4=8Dm=C3=A1=C5=99 wrote:
>>   Potentially, an NMI could be latched (while in SMM or upon exit) a=
nd
>>   serviced upon exit [...]
>>=20
>> This "Potentially" could be in the sense that the whole 3rd paragrap=
h is
>> only applicable to some ancient SMM design :)
>=20
> It could also be in the sense that you cannot exclude an NMI coming a=
t
> exactly the wrong time.

Yes, but it is hard to figure out how big the wrong time window is ...

Taken to the extreme, the paragraph says that we must inject NMI that
arrived while in SMM after RSM;  regardless of NMI blocking before.
(Which is not how real hardware works.)

> If you want to go full language lawyer, it does mention it whenever
> behavior is specific to a processor family.

True, I don't know of an exception, but that is not a proof for the
contrary here :/

>> The 1st paragraph has quite clear sentence:
>>=20
>>   If NMIs were blocked before the SMI occurred, they are blocked aft=
er
>>   execution of RSM.
>>=20
>> so I'd just ignore the 3rd paragraph ...

It's suspicious in other ways ... I'll focus on other part of the
sentence now

  Potentially, an NMI could be latched (while in SMM or upon exit)
                               ^^^^^^^^^^^^^^^^^^^^^

A NMI can't be latched in SMM mode and delivered after RSM when we
started with masked NMI.
It was latched in SMM, so we either didn't unmask NMIs or we were
executing a NMI in SMM mode.  The first case is covered by

  If NMIs were blocked before the SMI occurred, they are blocked after
  execution of RSM.

The second case, when we specialize the above, would need to unmask NMI=
s
with IRET, accept an NMI, and then do RSM before IRET (because IRET
would immediately inject the latched NMI);
if CPU unmasks NMIs in that case, I'd slap someone.

Btw. I had a good laugh on Intel's response to a similar question:
https://software.intel.com/en-us/forums/topic/305672

>> And the APM 2:10.3.3 Exceptions and Interrupts
| [...]
>> makes me think that we should unmask them unconditionally or that SM=
M
>> doesn't do anything with NMI masking.
>=20
> Actually I hadn't noticed this paragraph.  But I read it the same as =
the
> Intel manual (i.e. what I implemented): it doesn't say anywhere that =
RSM
> may cause the processor to *set* the "NMIs masked" flag.
>=20
> It makes no sense; as you said it's 1 bit of state!  But it seems tha=
t
> it's the architectural behavior. :(

Ok, it's sad and I'm too lazy to actually try it ...

>> If we can choose, less NMI nesting seems like a good idea.
>=20
> It would---I'm just preempting future patches from Nadav. :)

Me too :D

>                                                               That sa=
id,
> even if OVMF does do IRETs in SMM (in 64-bit mode it fills in page
> tables lazily for memory above 4GB), we do not care about asynchronou=
s
> SMIs such as those for power management.  So we should never enter SM=
M
> with NMIs masked, to begin with.

Yeah, it's a stupid corner case, the place where most of time and sanit=
y
is lost.