Re: [PATCH v3 06/11] optee: add std call handling

From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: Julien Grall <julien.grall@arm.com>
Cc: Volodymyr Babchuk <vlad.babchuk@gmail.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>,
	"tee-dev@lists.linaro.org" <tee-dev@lists.linaro.org>
Subject: Re: [PATCH v3 06/11] optee: add std call handling
Date: Thu, 17 Jan 2019 19:13:33 +0000	[thread overview]
Message-ID: <87tvi7t7vn.fsf@epam.com> (raw)
In-Reply-To: <67d7e422-246a-10db-341d-07b51cd5085f@arm.com>

Hi Julien,

Julien Grall writes:

> Hi Volodymyr,
>
> On 17/01/2019 15:21, Volodymyr Babchuk wrote:
>> Julien Grall writes:
>>
>>> Hi Volodymyr,
>>>
>>> On 18/12/2018 21:11, Volodymyr Babchuk wrote:
>>>> From: Volodymyr Babchuk <vlad.babchuk@gmail.com>
>>>>
>>>> The main way to communicate with OP-TEE is to issue standard SMCCC
>>>> call. "Standard" is a SMCCC term and it means that call can be
>>>> interrupted and OP-TEE can return control to NW before completing
>>>> the call.
>>>>
>>>> In contrast with fast calls, where arguments and return values
>>>> are passed in registers, standard calls use shared memory. Register
>>>> pair a1,a2 holds 64-bit PA of command buffer, where all arguments
>>>> are stored and which is used to return data. OP-TEE internally
>>>> copies contents of this buffer into own secure memory before accessing
>>>> and validating any data in command buffer. This is done to make sure
>>>> that NW will not change contents of the validated parameters.
>>>>
>>>> Mediator needs to do the same for number of reasons:
>>>>
>>>> 1. To make sure that guest will not change data after validation.
>>>> 2. To translate IPAs to PAs in the command buffer (this is not done
>>>>      in this patch).
>>>> 3. To hide translated address from guest, so it will not be able
>>>>      to do IPA->PA translation by misusing mediator.
>>>>
>>>> During standard call OP-TEE can issue multiple "RPC returns", asking
>>>> NW to do some work for OP-TEE. NW then issues special call
>>>> OPTEE_SMC_CALL_RETURN_FROM_RPC to resume handling of the original call.
>>>> Thus, mediator needs to maintain context for original standard call
>>>> during multiple SMCCC calls.
>>>>
>>>> Standard call is considered complete, when returned value is
>>>> not a RPC request.
>>>>
>>>> Signed-off-by: Volodymyr Babchuk <vlad.babchuk@gmail.com>
>>>> ---
>>>>
>>>>    Changes from v2:
>>>>     - renamed struct domain_ctx to struct optee_domain
>>>>     - fixed coding style
>>>>     - Now I use access_guest_memory_by_ipa() instead of mappings
>>>>       to read command buffer
>>>>     - Added tracking for in flight calls, so guest can't resume
>>>>       the same call from two CPUs simultaniously
>>>>
>>>>    xen/arch/arm/tee/optee.c     | 319 ++++++++++++++++++++++++++++++++++-
>>>>    xen/include/asm-arm/domain.h |   3 +
>>>>    2 files changed, 320 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
>>>> index 584241b03a..dc90e2ed8e 100644
>>>> --- a/xen/arch/arm/tee/optee.c
>>>> +++ b/xen/arch/arm/tee/optee.c
>>>> @@ -12,6 +12,8 @@
>>>>     */
>>>>      #include <xen/device_tree.h>
>>>> +#include <xen/domain_page.h>
>>>> +#include <xen/guest_access.h>
>>>>    #include <xen/sched.h>
>>>>    #include <asm/smccc.h>
>>>>    #include <asm/tee/tee.h>
>>>> @@ -22,11 +24,38 @@
>>>>    /* Client ID 0 is reserved for hypervisor itself */
>>>>    #define OPTEE_CLIENT_ID(domain) (domain->domain_id + 1)
>>>>    +/*
>>>> + * Maximal number of concurrent standard calls from one guest. This
>>>> + * corresponds to OPTEE configuration option CFG_NUM_THREADS, because
>>>> + * OP-TEE spawns a thread for every standard call.
>>>
>>> Looking at OP-TEE, CFG_NUM_THREADS will vary depending on the
>>> platform. Is there any way to probe that number of threads from Xen?
>> Yes, this is per-platform configuration.
>> Easiest way is to add Fast SMC to OP-TEE that will report this
>> parameter.
>> Jens, what do you think about adding additional call?
>>
>>> In any case, I think we should update the comment to reflect that this
>>> seems to be the maximum CFG_NUM_THREADS supported by any upstream
>>> platform.
>>
>> Actually, OP-TEE protocol have possibility to handle limited number of
>> threads correctly. OP-TEE can report that all threads are busy and
>> client will wait for a free one. XEN can do the same, although this is not
>> implemented in this patch series. But I can add this.
>
> Could you expand by wait? Will it block in OP-TEE/Xen or does it
> return to the guest?
It returns to the guest with response code
OPTEE_SMC_RETURN_ETHREAD_LIMIT. Linux driver blocks calling application
thread until one of the calls to OP-TEE is finished. Then driver awakens
one of the blocked threads, so it can perform the call.

>>
>> Basically this means that all can work correctly even with
>> MAX_STD_CALLS==1. It just will be not so efficient.
>
> Given the OS is not aware of the number of threads, The problem would
> be the same without Xen between. Am I right?
Exactly.

> [...]
>
>>>> +
>>>> +    /*
>>>> +     * Command buffer should start at page boundary.
>>>> +     * This is OP-TEE ABI requirement.
>>>> +     */
>>>> +    if ( call->guest_arg_ipa & (OPTEE_MSG_NONCONTIG_PAGE_SIZE - 1) )
>>>> +        return false;
>>>> +
>>>> +    call->xen_arg = _xmalloc(OPTEE_MSG_NONCONTIG_PAGE_SIZE,
>>>> +                             OPTEE_MSG_NONCONTIG_PAGE_SIZE);
>>>> +    if ( !call->xen_arg )
>>>> +        return false;
>>>> +
>>>> +    BUILD_BUG_ON(OPTEE_MSG_NONCONTIG_PAGE_SIZE > PAGE_SIZE);
>>>
>>> As you use _xmalloc, you should not need this. This is only necessary
>>> if you use alloc_xenheap_page.
>>>
>> Yes, right. This is leftover from time when I mapped guest page into
>> XEN. I'll move it down, to place where I map that page.
>>
>>> I am wondering whether it is wise to allocate the memory from xenheap
>>> and not domheap. While on Arm64 (for now) xenheap and domheap are the
>>> same, on Arm32 they are different. The xenheap is at most 1GB, so
>>> pretty limited.
>> Honestly, I have no opinion there. What are limitations of domheap in
>> this case?
>
> domheap pages may not always be mapped in Xen page-tables. So you have
> to call map_domain_page/unmap_domain_page at every use.

> In practice, on Arm64, those operations are today a NOP because the
> memory is always mapped. On Arm32, domheap is never mapped so those
> operations will require to modify the page-tables.

> There would be potentially ways to optimize the Arm32 case. So I think
> this is not a big concern as it would allow to account the memory to
> the domain and take advantage of potential new safety feature around
> domheap.
>
> BTW, I am not asking to implement the accounting :). You can still
> allocate domheap memory without a domain assigned. I am only giving
> the advantages of using domheap over xenheap :).
Actually, I considered using domheap in the beginning. But for some
reason I though that domheap is limited by a total memory assigned for a
domain. Looks like I was mistaken.
So yes, I'll used domheap for a large allocations.

[...]
>>>> +{
>>>> +    struct optee_msg_arg *guest_arg;
>>>> +    struct page_info *page;
>>>> +    unsigned int i;
>>>> +    uint32_t attr;
>>>> +
>>>> +    /* copy_std_request() validated IPA for us */
>>>
>>> Not really, the guest is free to modify the stage-2 mapping on another
>>> vCPU while this is happening. I agree that the guest will shoot
>>> himself, but we at least need to not have weird behavior happening.
>>>
>>> In that case, I would check that the type is p2m_ram_rw as you don't
>>> want to write in read-only or foreign mapping.
>> How I can do this atomically? I.e. what if guest will mangle p2m right
>> after the check?
>
> What you want to prevent is Xen writing to the wrong page. The guest
> should not play with page that are shared with an higher exception
> level.
>
> get_page_from_gfn() takes a reference on the current page, that will
> be release by put_page(). Between that you are sure the page can not
> disappear under your feet.
Ahhh, right. Thank you.

>
> Furthermore, AFAIK, there are no way for an Arm guest to modify the
> p2m type of a page once inserted. It can only remove or replace with a
> newly allocated page the mapping. If the guest instructs to
> 	- remove the page, as you have a reference that page will not disappear.
> 	- replace the page with a new one, then the guest will not be
> able to see the result. Tough luck, but it was not meant to do that
> :).
>
>>
>>> Also, as copy_std_request() and copy_std_request_back may not be
>>> called in the same "thread" it would be useful if you specify a bit
>>> more the interaction.
>> I not sure what do you mean there.
>
> What I meant is the following can happen:
>
>      guest vCPU A         |   Xen
>
>      Initiate call 1
> 	                      copy_std_request()
>                               call OP-TEE
> 				-> Call "preempted"
>                      	      return to guest
>      Resume call 1
> 			      resume call in OP-TEE
> 			      copy_std_back_request()
>
Yes, this is right.

> AFAICT, the call could even resume from a different vCPU.
This is also true.

> It is not
> entirely trivial to understand this from just reading the code and the
> comment "copy_std_request() validated IPA for us" leads to think
> copy_std_request() was called right before. This may not be the
> case. So can you detail a bit more the interaction in the code?
Yes, there can be an issue. My previous approach pined guest page for a
call duration, so I was sure that IPA is still valid. But now this is
not true anymore. So yes, this comment is misleading. I'll fix both the
code and the comment.

[...]
>>>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
>>>> index 175de44927..88b48697bd 100644
>>>> --- a/xen/include/asm-arm/domain.h
>>>> +++ b/xen/include/asm-arm/domain.h
>>>> @@ -97,6 +97,9 @@ struct arch_domain
>>>>        struct vpl011 vpl011;
>>>>    #endif
>>>>    +#ifdef CONFIG_TEE
>>>> +    void *tee;
>>>> +#endif
>>>
>>> Did you look whether there are any hole in arch_domain that could be re-used?
>> I thought about that. But what are chances to find 64bit-wide,
>> 64bit-aligned hole in a structure? If I remember C standard correctly,
>> there are no reasons for compiler to leave such holes.
>
> It depends on the alignment requested for each structure. Have a look
> at pahole to see the number (and size) of holes we have in some
> structures ;).
Wow, 108 bytes hole in rcu_ctrlblk :) And yes, only two 8 byte holes in
the whole xen.

--
Best regards, Volodymyr Babchuk
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel