From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Huang, Kai" <kai.huang@linux.intel.com>
Subject: Re: [intel-sgx-kernel-dev] [PATCH 08/10] kvm: vmx: add guest's
 IA32_SGXLEPUBKEYHASHn runtime switch support
Date: Wed, 7 Jun 2017 10:51:51 +1200
Message-ID: <90924786-c132-524e-177f-7af582d49c85@linux.intel.com>
References: <20170508052434.3627-1-kai.huang@linux.intel.com>
 <20170508052434.3627-9-kai.huang@linux.intel.com>
 <58dcdb2d-6894-b0a3-8d6f-2ab752fd6d22@linux.intel.com>
 <CALCETrXG6KZEuxzDRwrAC5Wp=tCFaUBt4oPMQE40gdYB4p_A_Q@mail.gmail.com>
 <6ab7ec4e-e0fa-af47-11b2-f26edcb088fb@linux.intel.com>
 <CALCETrUcYfb8E2Ot=WhPH79bVNoPxyBX1ot02o_QvxVsLsQnMg@mail.gmail.com>
 <596dc1ad-eac7-798d-72e5-665eb7f3f2e4@linux.intel.com>
 <CALCETrXQCqtAGi2nvyB5_xndbaTymEk_XaVAX_EZUttUo2udCw@mail.gmail.com>
 <0b4697b9-0976-c8ad-e26f-4ff683318486@linux.intel.com>
 <ac08edb4-4f3b-443d-c301-1e6c8386c90b@linux.intel.com>
 <CALCETrX=-hwxSQz1F_e22vtCLyWAGbN3fev7faAHDCUiTWgLtA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Kai Huang <kaih.linux@gmail.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Radim Krcmar <rkrcmar@redhat.com>,
        kvm list <kvm@vger.kernel.org>,
        "intel-sgx-kernel-dev@lists.01.org"
        <intel-sgx-kernel-dev@lists.01.org>, haim.cohen@intel.com,
        Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
To: Andy Lutomirski <luto@kernel.org>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mga06.intel.com ([134.134.136.31]:13956 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751419AbdFFWv7 (ORCPT <rfc822;kvm@vger.kernel.org>);
        Tue, 6 Jun 2017 18:51:59 -0400
In-Reply-To: <CALCETrX=-hwxSQz1F_e22vtCLyWAGbN3fev7faAHDCUiTWgLtA@mail.gmail.com>
Content-Language: en-US
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>


On 6/7/2017 9:22 AM, Andy Lutomirski wrote:
> On Tue, Jun 6, 2017 at 1:52 PM, Huang, Kai <kai.huang@linux.intel.com> wrote:
>>
>>
>> On 5/18/2017 7:45 PM, Huang, Kai wrote:
>>>
>>>
>>>
>>> On 5/17/2017 12:09 PM, Andy Lutomirski wrote:
>>>>
>>>> On Mon, May 15, 2017 at 5:48 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 5/12/2017 6:11 PM, Andy Lutomirski wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, May 11, 2017 at 9:56 PM, Huang, Kai <kai.huang@linux.intel.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> I am not sure whether the cost of writing to 4 MSRs would be
>>>>>>> *extremely*
>>>>>>> slow, as when vcpu is schedule in, KVM is already doing vmcs_load,
>>>>>>> writing
>>>>>>> to several MSRs, etc.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm speculating that these MSRs may be rather unoptimized and hence
>>>>>> unusualy slow.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Have a percpu variable that stores the current SGXLEPUBKEYHASH along
>>>>>>>> with whatever lock is needed (probably just a mutex).  Users of EINIT
>>>>>>>> will take the mutex, compare the percpu variable to the desired
>>>>>>>> value,
>>>>>>>> and, if it's different, do WRMSR and update the percpu variable.
>>>>>>>>
>>>>>>>> KVM will implement writes to SGXLEPUBKEYHASH by updating its
>>>>>>>> in-memory
>>>>>>>> state but *not* changing the MSRs.  KVM will trap and emulate EINIT
>>>>>>>> to
>>>>>>>> support the same handling as the host.  There is no action required
>>>>>>>> at
>>>>>>>> all on KVM guest entry and exit.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This is doable, but SGX driver needs to do those things and expose
>>>>>>> interfaces for KVM to use. In terms of the percpu data, it is nice to
>>>>>>> have,
>>>>>>> but I am not sure whether it is mandatory, as IMO EINIT is not even in
>>>>>>> performance critical path. We can simply read old value from MSRs out
>>>>>>> and
>>>>>>> compare whether the old equals to the new.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I think the SGX driver should probably live in arch/x86, and the
>>>>>> interface could be a simple percpu variable that is exported (from the
>>>>>> main kernel image, not from a module).
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> FWIW, I think that KVM will, in the long run, want to trap EINIT for
>>>>>>>> other reasons: someone is going to want to implement policy for what
>>>>>>>> enclaves are allowed that applies to guests as well as the host.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I am not very convinced why "what enclaves are allowed" in host would
>>>>>>> apply
>>>>>>> to guest. Can you elaborate? I mean in general virtualization just
>>>>>>> focus
>>>>>>> emulating hardware behavior. If a native machine is able to run any
>>>>>>> LE,
>>>>>>> the
>>>>>>> virtual machine should be able to as well (of course, with guest's
>>>>>>> IA32_FEATURE_CONTROL[bit 17] set).
>>>>>>
>>>>>>
>>>>>>
>>>>>> I strongly disagree.  I can imagine two classes of sensible policies
>>>>>> for launch control:
>>>>>>
>>>>>> 1. Allow everything.  This seems quite sensible to me.
>>>>>>
>>>>>> 2. Allow some things, and make sure that VMs have at least as
>>>>>> restrictive a policy as host root has.  After all, what's the point of
>>>>>> restricting enclaves in the host if host code can simply spawn a
>>>>>> little VM to run otherwise-disallowed enclaves?
>>>>>
>>>>>
>>>>>
>>>>> What's the current SGX driver launch control policy? Yes allow
>>>>> everything
>>>>> works for KVM so lets skip this. Are we going to support allowing
>>>>> several
>>>>> LEs, or just allowing one single LE? I know Jarkko is doing in-kernel LE
>>>>> staff but I don't know details.
>>>>>
>>>>> I am trying to find a way that we can both not break host launch control
>>>>> policy, and be consistent to HW behavior (from guest's view). Currently
>>>>> we
>>>>> can create a KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>>> either
>>>>> enabled or disabled. I introduced an Qemu parameter 'lewr' for this
>>>>> purpose.
>>>>> Actually I introduced below Qemu SGX parameters for creating guest:
>>>>>
>>>>>          -sgx epc=<size>,lehash='SHA-256 hash',lewr
>>>>>
>>>>> where 'epc' specifies guest's EPC size, lehash specifies (initial) value
>>>>> of
>>>>> guest's IA32_SGXLEPUBKEYHASHn, and 'lewr' specifies whether guest is
>>>>> allowed
>>>>> to change guest's IA32_SGXLEPUBKEYHASHn at runtime.
>>>>>
>>>>> If host only allows one single LE to run, KVM can add a restrict that
>>>>> only
>>>>> allows to create KVM guest with runtime change to IA32_SGXLEPUBKEYHASHn
>>>>> disabled, so that only host allowed (single) hash can be used by guest.
>>>>> From
>>>>> guest's view, it simply has IA32_FEATURE_CONTROL[bit17] cleared and has
>>>>> IA32_SGXLEPUBKEYHASHn with default value to be host allowed (single)
>>>>> hash.
>>>>>
>>>>> If host allows several LEs (not but everything), and if we create guest
>>>>> with
>>>>> 'lewr', then the behavior is not consistent with HW behavior, as from
>>>>> guest's hardware's point of view, we can actually run any LE but we have
>>>>> to
>>>>> tell guest that you are only allowed to change IA32_SGXLEPUBKEYHASHn to
>>>>> some
>>>>> specific values. One compromise solution is we don't allow to create
>>>>> guest
>>>>> with 'lewr' specified, and at the meantime, only allow to create guest
>>>>> with
>>>>> host approved hashes specified in 'lehash'. This will make guest's
>>>>> behavior
>>>>> consistent to HW behavior but only allows guest to run one LE (which is
>>>>> specified by 'lehash' when guest is created).
>>>>
>>>>
>>>> I'm not sure I entirely agree for a couple reasons.
>>>>
>>>> 1. I wouldn't be surprised if the kernel ends up implementing a policy
>>>> in which it checks all enclaves (not just LEs) for acceptability.  In
>>>> fact, if the kernel sticks with the "no LE at all or just
>>>> kernel-internal LE", then checking enclaves directly against some
>>>> admin- or distro-provided signer list seems reasonable.  This type of
>>>> policy can't be forwarded to a guest by restricting allowed LE
>>>> signers.  But this is mostly speculation since AFAIK no one has
>>>> seriously proposed any particular policy support and the plan was to
>>>> not have this for the initial implementation.
>>>>
>>>> 2. While matching hardware behavior is nice in principle, there
>>>> doesn't seem to be useful hardware behavior to match here.  If the
>>>> host had a list of five allowed LE signers, how exactly would it
>>>> restrict the MSRs?  They're not written atomically, so you can't
>>>> directly tell what's being written.
>>>
>>>
>>> In this case I actually plan to just allow creating guest with guest's
>>> IA32_SGXLEPUBKEYHASHn disabled (without 'lewr' specified). If 'lewr' is
>>> specified, creating guest will fail. And we only allow creating guest with
>>> host allowed hash values (with 'lehash=hash-value'), and if 'hash-value'
>>> specified by 'lehash' is not allowed by host, we also fail to create guest.
>>>
>>> We can only allow creating guest with 'lewr' specified when host allows
>>> anything.
>>>
>>> But in this way, we are restricting guest OS's ability to run LE, as only
>>> one LE, that is specified by 'lehash' parameter, can be run. But I think
>>> this won't hurt much, as multiple guests still are able to run different
>>> LEs?
>>>
>>> Also, the only way to fail an MSR
>>>>
>>>> write is to send #GP, and Windows (and even Linux) may not expect
>>>> that.  Linux doesn't panic due to #GP on MSR writes these days, but
>>>> you still get a big fat warning.  I wouldn't be at all surprised if
>>>> Windows BSODs.
>>>
>>>
>>> We cannot allow writing some particular value to MSRs successfully, while
>>> injecting #GP when writing other values to the same MSRs. So #GP is not
>>> option.
>>>
>>> ENCLS[EINIT], on the other hand, returns an actual
>>>>
>>>> error code.  I'm not sure that a sensible error code exists
>>>> ("SGX_HYPERVISOR_SAID_NO?", perhaps),
>>>
>>>
>>> Looks no such error code exists. And we cannot return such error code to
>>> guest as such error code is only supposed to be valid when ENCLS is run in
>>> hypervisor.
>>>
>>> but SGX_INVALID_EINITTOKEN seems
>>>>
>>>> to mean, more or less, "the CPU thinks you're not authorized to do
>>>> this", so forcing that error code could be entirely reasonable.
>>>>
>>>> If the host policy is to allow a list of LE signers, you could return
>>>> SGX_INVALID_EINITTOKEN if the guest tries to EINIT an LE that isn't in
>>>> the list.
>>>
>>>
>>> But this would be inconsistent with HW behavior. If the hash value in
>>> guest's IA32_SGXLEPUBKEYHASHn is matched with the one passed by EINIT, EINIT
>>> is not supposed to return SGX_INVALID_EINITTOKEN.
>>>
>>> I think from VMM's perspective, emulating HW behavior to be consistent
>>> with real HW behavior is very important.
>>>
>>> Paolo, would you provide your comments?
>>
>>
>> Hi all,
>>
>> This has been quite for a while and I'd like to start discussion again.
>> Jarkko told me that currently he only supports one LE in SGX driver, but I
>> am not sure whether he is going to extend in the future or not. I think this
>> might also depend on requirements from customers.
>>
>> Andy,
>>
>> If we only support one LE in driver, then we can only support the same LE
>> for all KVM guests, according to your comments that host kernel launch
>> control policy should also apply to KVM guests? WOuld you comments more?
> 
> I'm not at all convinced that, going forward, Linux's host-side launch
> control policy will be entirely contained in the LE.  I'm also not
> convinced that non-Linux guests will function at all under this type
> of policy -- what if FreeBSD's LE is different for whatever reason?

I am not convinced either. I think we need Jarkko to elaborate how is 
host side launch control policy implemented, or is there any policy at 
all. I also tried to read SGX driver code but looks I couldn't find any 
implementation regarding to this.

Hi Jarkko,

Can you elaborate on this?

Thanks,
-Kai
> 
>>
>> Jarkko,
>>
>> Could you help to clarify the whole launch control policy in host side so
>> that we can have a better understanding together?
>>
>> Thanks,
>> -Kai
>>
>>>
>>>>
>>>> --Andy
>>>>
>>
>