All of lore.kernel.org
 help / color / mirror / Atom feed
* EL0 app, stubdoms on ARM conf call
@ 2017-06-15 18:21 Stefano Stabellini
  2017-06-15 19:37 ` Volodymyr Babchuk
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Stabellini @ 2017-06-15 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Artem_Mygaiev, sstabellini, Andrii Anisov, vlad.babchuk,
	dario.faggioli, george.dunlap, julien.grall

Hi all,

Would you be up for joining a conf call to discuss EL0 apps and stubdoms
on ARM in preparation for Xen Developer Summit?

If so, would Wednesday the 28th of June at 9AM PST work for you?

I realize we also have the ARM community call next well, but this is a
large topic which deserves an entire slot for itself, and also would be
nice to have scheduling experts involved.

Please reply to confirm your presence. If enough people will attend,
I'll send out meeting invites.

Cheers,

Stefano

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EL0 app, stubdoms on ARM conf call
  2017-06-15 18:21 EL0 app, stubdoms on ARM conf call Stefano Stabellini
@ 2017-06-15 19:37 ` Volodymyr Babchuk
  2017-06-15 20:14   ` Stefano Stabellini
  0 siblings, 1 reply; 9+ messages in thread
From: Volodymyr Babchuk @ 2017-06-15 19:37 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, Dario Faggioli,
	George Dunlap, Julien Grall

Hello Stefano,

On 15 June 2017 at 21:21, Stefano Stabellini <sstabellini@kernel.org> wrote:
> Would you be up for joining a conf call to discuss EL0 apps and stubdoms
> on ARM in preparation for Xen Developer Summit?
>
> If so, would Wednesday the 28th of June at 9AM PST work for you?
I would prefer later time (like 5PM), but 9AM also works for me.


-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EL0 app, stubdoms on ARM conf call
  2017-06-15 19:37 ` Volodymyr Babchuk
@ 2017-06-15 20:14   ` Stefano Stabellini
  2017-06-15 20:38     ` Volodymyr Babchuk
  2017-06-16  6:00     ` Dario Faggioli
  0 siblings, 2 replies; 9+ messages in thread
From: Stefano Stabellini @ 2017-06-15 20:14 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, Dario Faggioli,
	George Dunlap, Julien Grall, xen-devel

On Thu, 15 Jun 2017, Volodymyr Babchuk wrote:
> Hello Stefano,
> 
> On 15 June 2017 at 21:21, Stefano Stabellini <sstabellini@kernel.org> wrote:
> > Would you be up for joining a conf call to discuss EL0 apps and stubdoms
> > on ARM in preparation for Xen Developer Summit?
> >
> > If so, would Wednesday the 28th of June at 9AM PST work for you?
> I would prefer later time (like 5PM), but 9AM also works for me.
 
Wait, did you get the timezone right?

1) 9AM PST = 5PM London = 7PM Kyiv


I could do 5PM PST without troubles, but:

2) 5PM PST = 1AM London = 3AM Kyiv


I think it's best to stay with the first option :-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EL0 app, stubdoms on ARM conf call
  2017-06-15 20:14   ` Stefano Stabellini
@ 2017-06-15 20:38     ` Volodymyr Babchuk
  2017-06-16  6:00     ` Dario Faggioli
  1 sibling, 0 replies; 9+ messages in thread
From: Volodymyr Babchuk @ 2017-06-15 20:38 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, Dario Faggioli,
	George Dunlap, Julien Grall

On 15 June 2017 at 23:14, Stefano Stabellini <sstabellini@kernel.org> wrote:
>> > If so, would Wednesday the 28th of June at 9AM PST work for you?
>> I would prefer later time (like 5PM), but 9AM also works for me.
>
> Wait, did you get the timezone right?
>
> 1) 9AM PST = 5PM London = 7PM Kyiv
>
>
> I could do 5PM PST without troubles, but:
>
> 2) 5PM PST = 1AM London = 3AM Kyiv
>
>
> I think it's best to stay with the first option :-)

Oh, it is *PM*. Yeah, my fault. I prefer first option, indeed :-)

-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EL0 app, stubdoms on ARM conf call
  2017-06-15 20:14   ` Stefano Stabellini
  2017-06-15 20:38     ` Volodymyr Babchuk
@ 2017-06-16  6:00     ` Dario Faggioli
  2017-06-16 17:19       ` Stefano Stabellini
  1 sibling, 1 reply; 9+ messages in thread
From: Dario Faggioli @ 2017-06-16  6:00 UTC (permalink / raw)
  To: Stefano Stabellini, Volodymyr Babchuk
  Cc: Artem_Mygaiev, Julien Grall, xen-devel, Andrii Anisov,
	George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 1116 bytes --]

On Thu, 2017-06-15 at 13:14 -0700, Stefano Stabellini wrote:
> On Thu, 15 Jun 2017, Volodymyr Babchuk wrote:
> > Hello Stefano,
> > On 15 June 2017 at 21:21, Stefano Stabellini
> > <sstabellini@kernel.org> wrote:
> > > Would you be up for joining a conf call to discuss EL0 apps and
> > > stubdoms
> > > on ARM in preparation for Xen Developer Summit?
> > > 
> > > If so, would Wednesday the 28th of June at 9AM PST work for you?
> > 
> > I would prefer later time (like 5PM), but 9AM also works for me.
> 
>  
> Wait, did you get the timezone right?
> 
> 1) 9AM PST = 5PM London = 7PM Kyiv
> 
Count me in.

It would be great if someone could send an meeting invite, so that my
mailer will do the timezone conversion and set reminders, and I don't
risk showing up on the wrong day at the wrong time. :-P

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EL0 app, stubdoms on ARM conf call
  2017-06-16  6:00     ` Dario Faggioli
@ 2017-06-16 17:19       ` Stefano Stabellini
  2017-06-29 19:04         ` Volodymyr Babchuk
  0 siblings, 1 reply; 9+ messages in thread
From: Stefano Stabellini @ 2017-06-16 17:19 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov,
	Volodymyr Babchuk, George Dunlap, Julien Grall, xen-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 903 bytes --]

On Fri, 16 Jun 2017, Dario Faggioli wrote:
> On Thu, 2017-06-15 at 13:14 -0700, Stefano Stabellini wrote:
> > On Thu, 15 Jun 2017, Volodymyr Babchuk wrote:
> > > Hello Stefano,
> > > On 15 June 2017 at 21:21, Stefano Stabellini
> > > <sstabellini@kernel.org> wrote:
> > > > Would you be up for joining a conf call to discuss EL0 apps and
> > > > stubdoms
> > > > on ARM in preparation for Xen Developer Summit?
> > > > 
> > > > If so, would Wednesday the 28th of June at 9AM PST work for you?
> > > 
> > > I would prefer later time (like 5PM), but 9AM also works for me.
> > 
> >  
> > Wait, did you get the timezone right?
> > 
> > 1) 9AM PST = 5PM London = 7PM Kyiv
> > 
> Count me in.
> 
> It would be great if someone could send an meeting invite, so that my
> mailer will do the timezone conversion and set reminders, and I don't
> risk showing up on the wrong day at the wrong time. :-P

I'll do.

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EL0 app, stubdoms on ARM conf call
  2017-06-16 17:19       ` Stefano Stabellini
@ 2017-06-29 19:04         ` Volodymyr Babchuk
  2017-06-29 21:26           ` Dario Faggioli
  0 siblings, 1 reply; 9+ messages in thread
From: Volodymyr Babchuk @ 2017-06-29 19:04 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, Oleksandr Andrushchenko,
	Dario Faggioli, George Dunlap, Oleksandr Tyshchenko, Julien Grall

Hello all,

Thank you all for the call.

As was agreed, I'll to provide some details on our use cases. I want
to tell you about four cases: one is OP-TEE related, while other three
shows various aspects of virtualized coprocesssors workflow.

1. OP-TEE use case: DRM playback (secure data path).

User wants to play a DRM-protected media file. Rights holders don't
want to give user any means to get DRM-free copy of that media file.
If you ever heard about Widevine on Android - that it is. Long story
short, it is possible to decrypt, decode and display a video frame in
a such way, that decrypted data will never be accessible to userspace,
kernel or even to hypervisor. This is possible only when all data
processing is done in secure mode, which leads us to OP-TEE or
(another TEE).
So, for each video frame media player should call OP-TEE with
encrypted frame data.

Good case: 24FPS movie, optimized data path: media player registers
shared buffers in OP-TEE only once and then reuses them during every
invocation. That would be one OP-TEE call per frame or 24 calls per
second.
Worst case: High frame rate movie (60 FPS), data path in not
optimized. Media player registers shared buffer in OP-TEE, then asks
it to process frame, then unregisters buffer. 60 * 3 = 180 calls per
second.

Сall is done using SMC instruction. Let's assume that OP-TEE mediator
lives in Stubdom. There is how call sequence can look like:

1. DomU issues SMC, which is trapped by Hypervisor
2. Hypervisor uses standard approach with ring buffer and event
mechanism to call Stubdom. Also it blocks DomU's vCPU which caused
this trap.
3a. Stubdom mangles request and asks Hypervisor to issue real SMC
(3b. Stubdom mangles request and issues SMC by itself - potentially insecure)
4. After real SMC, Hypervisor returns control back to Stubdom
5. Stubdom mangles return value and returns response to Hypervisor in
a ring buffer
6. Hypervisor unblocks DomU's VCPU and schedules it.

As you can see, there are 6 context switches
(DomU->HYP->Stubdom->HYP->Stubdom->HYP->DomU). There are 2 VCPU
switches (DomU->Stubdom->DomU). Both VCPU switches are governed by a
scheduler.
When I say "governed by scheduler" I imply that there are no
guarantees that needed domain will be scheduled right now.
This is sequence for one call. As you remember, there can be up to 180
such calls per second in this use case. That gives us 180 * 6 ~= 1000
context switches per second.


2. Coprocessor use case: coprocessor context switch.

Lets assume that coprocessor was used by Dom1 and now it is time to
switch context, so Dom2 can use it. Returning back to GPU case, if we
want to show 60 FPS, then we need at least 60*N context switches,
where N is number of domains that use GPU. This is lower margin,
obviously. Context switch is done in two parts: "context switch from"
and "context switch to". Context switch procedure is device-specific,
so there should be driver for every supported device. This driver does
actual work. We can't have this driver in hypervisor. Let's assume
that driver is running in a Stubdom.
Context switch is requested by the hypervisor. So, best-case scenario
is following:

1. Hypervisor asks Stubdom to do "context switch from"
2. Stubdom sends event back to hypervisor when task is done
(Hypervisor reconfigures IOMMU)
3. Hypervisor asks Stubdom to do "context switch to"
4. Stubdom sends event back to hypervisor when task is done

You can't merge Stubdomain call to "context switch from/to", because
between p.2 and p.3 hypervisor needs to reconfigure IOMMU for GPU.
So, there are 4 context switches, two of them are governed by
scheduler. Or this is 240 context switches per second per domain per
coprocessor. As was said, this is lower margin.

3. Coprocessor use case: MMIO access from domain to a virtualized device.

Usually communication between processor and coprocessor is done in the
following way: processor writes command into a shared memory and than
kick interrupt in coprocessor, coprocessor processes task, writes
response back to a shared memory and issues IRQ to a processor.
Coprocessor is kicked by writing to one of its registers that are
mapped to a memory.

In case if vcoproc is active right now, we *might* can pass this MMIO
access right to it. But in our current case, we nevertheless need to
trap this access and route them to the driver. If vcoproc is not
active, we always need to route this MMIO access to the driver,
because only driver knows what to do with this requests right now.

So, summarizing, domain will write to MMIO range every time it wants
something from coprocessor. There can be hundreds such calls for *one*
frame (e.g. load texture, load shader, load geometry, run shader,
repeat). How it looks:
1. DomU writes or reads to/from MMIO register.
2. XEN traps this access and notifies Stubdom (also it blocks DomU vcpu)
3. Stubdom analyzes request and does actual write (or stores value internally).
4. Stubdom sends event back to XEN
5. XEN unblocks DomU vcpu.

That gives us four context switches (two of them are governed by
scheduler). As I said, there can be hundreds such writes for every
frame.  Which gives us 100*60*4 = 24 000 switches per second per
domain. This no lower margin, but it also not higher margin.

4. Coprocessor use case: Interrupt from virtualized device to a domain.
As I said, coprocessor will send interrupt back, when in finishes a
task. Again, driver needs to process this interrupt before forwarding
it to the DomU:

1. XEN receives interrupt  and routes it to Stubdom (probably vGIC
can done this for us, so we will not trap into HYP).
2. Stubdom receives interrupt, handles it and asks XEN to inject it into DomU.

Two context switches, both governed by scheduler. This is additional
12 000 switches per second.


As you can see, the worst scenarios are 3 and 4. We are working to
optimize them. Ideal solution will be eliminate them at all, or at
least don't trap IRQs and MMIO access for active vcoproc. But we need
to trap MMIO access for inactive vcoproc in any case.

I think, how you have some understanding regarding our requirements.
Please feel free to ask any questions.
Also I want to say thank you to Oleksandr Andrushchenko and Andrii
Anisov for briefing be about VCF workflows.

On 16 June 2017 at 20:19, Stefano Stabellini <sstabellini@kernel.org> wrote:
> On Fri, 16 Jun 2017, Dario Faggioli wrote:
>> On Thu, 2017-06-15 at 13:14 -0700, Stefano Stabellini wrote:
>> > On Thu, 15 Jun 2017, Volodymyr Babchuk wrote:
>> > > Hello Stefano,
>> > > On 15 June 2017 at 21:21, Stefano Stabellini
>> > > <sstabellini@kernel.org> wrote:
>> > > > Would you be up for joining a conf call to discuss EL0 apps and
>> > > > stubdoms
>> > > > on ARM in preparation for Xen Developer Summit?
>> > > >
>> > > > If so, would Wednesday the 28th of June at 9AM PST work for you?
>> > >
>> > > I would prefer later time (like 5PM), but 9AM also works for me.
>> >
>> >
>> > Wait, did you get the timezone right?
>> >
>> > 1) 9AM PST = 5PM London = 7PM Kyiv
>> >
>> Count me in.
>>
>> It would be great if someone could send an meeting invite, so that my
>> mailer will do the timezone conversion and set reminders, and I don't
>> risk showing up on the wrong day at the wrong time. :-P
>
> I'll do.



-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EL0 app, stubdoms on ARM conf call
  2017-06-29 19:04         ` Volodymyr Babchuk
@ 2017-06-29 21:26           ` Dario Faggioli
  2017-06-29 22:21             ` Volodymyr Babchuk
  0 siblings, 1 reply; 9+ messages in thread
From: Dario Faggioli @ 2017-06-29 21:26 UTC (permalink / raw)
  To: Volodymyr Babchuk, Stefano Stabellini
  Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, Oleksandr Andrushchenko,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall


[-- Attachment #1.1: Type: text/plain, Size: 6311 bytes --]

On Thu, 2017-06-29 at 22:04 +0300, Volodymyr Babchuk wrote:
> Hello all,
> 
Hello,

> 1. OP-TEE use case: DRM playback (secure data path).
> 
> User wants to play a DRM-protected media file. Rights holders don't
> want to give user any means to get DRM-free copy of that media file.
> If you ever heard about Widevine on Android - that it is. Long story
> short, it is possible to decrypt, decode and display a video frame in
> a such way, that decrypted data will never be accessible to
> userspace,
> kernel or even to hypervisor. This is possible only when all data
> processing is done in secure mode, which leads us to OP-TEE or
> (another TEE).
> So, for each video frame media player should call OP-TEE with
> encrypted frame data.
> 
> Good case: 24FPS movie, optimized data path: media player registers
> shared buffers in OP-TEE only once and then reuses them during every
> invocation. That would be one OP-TEE call per frame or 24 calls per
> second.
> Worst case: High frame rate movie (60 FPS), data path in not
> optimized. Media player registers shared buffer in OP-TEE, then asks
> it to process frame, then unregisters buffer. 60 * 3 = 180 calls per
> second.
> 
> Сall is done using SMC instruction. Let's assume that OP-TEE mediator
> lives in Stubdom. There is how call sequence can look like:
> 
> 1. DomU issues SMC, which is trapped by Hypervisor
> 2. Hypervisor uses standard approach with ring buffer and event
> mechanism to call Stubdom. Also it blocks DomU's vCPU which caused
> this trap.
> 3a. Stubdom mangles request and asks Hypervisor to issue real SMC
> (3b. Stubdom mangles request and issues SMC by itself - potentially
> insecure)
> 4. After real SMC, Hypervisor returns control back to Stubdom
> 5. Stubdom mangles return value and returns response to Hypervisor in
> a ring buffer
> 6. Hypervisor unblocks DomU's VCPU and schedules it.
> 
> As you can see, there are 6 context switches
> (DomU->HYP->Stubdom->HYP->Stubdom->HYP->DomU). There are 2 VCPU
> switches (DomU->Stubdom->DomU). Both VCPU switches are governed by a
> scheduler.
> When I say "governed by scheduler" I imply that there are no
> guarantees that needed domain will be scheduled right now.
> This is sequence for one call. As you remember, there can be up to
> 180
> such calls per second in this use case. That gives us 180 * 6 ~= 1000
> context switches per second.
> 
Ok. This is a quite detailed, well done, and useful description of the
specific characteristics of your workflow.

If possible, though, I'd like to know even more. Specifically, on a
somewhat typical system:
- how much pCPUs will you have?
- how much vCPUs will Dom0 have?
- what would Dom0 be doing (as in, what components of your overall
platform would be running there), and how busy, at least roughly, do
you expect it would be?
- how many vCPUs will DomU have?
- how many vCPUs will Stubdom have? (I'm guessing one, as there's only
1 OP-TEE, does that make sense?)
- how many other domains will there be? How many vCPUs will each one of
them have?

I understand it's a lot of questions, but it's quite important to have
these info, IMO. They don't have to be super-precise and totally match
the final look and setup of your final product, it "just" have to be a
representative enough example.

I'll try to explain why I think it would be useful to know all these
things. So, for instance, in the scenario you describe above, if you:
- have only 1 pCPUs
- Dom0 has 1 vCPU, and it runs the standard backeds. Which means,
unless when DomU is doing either disk or network I/O, it's mostly idle
- DomU has 1 vCPU
- Stubdom has 1 vCPU
- there's no other domain

What I think will happen most of the time will be something like this:

[1]  DomU runs
     .
     .
[2]  DomU calls SMC
     Xen blocks DomU
     Xen wakes Stubdom
[3]  Stubdom runs, does SMC
     .
     SMC done, Stubdom blocks
     Xen wakes DomU
[4]  DomU runs
     .
     .

At [1], Dom0 and Stubdom are idle, and DomU is the only running domain
(or, to be precise, vCPU), and so it runs. When, at [2], it calls SMC,
it also blocks. Therefore, at [3], it's Stubdom that is the only
runnable domain, and in fact, the scheduler let it run. Finally, at
[4], since Stubdom has blocked again, while DomU has been woken up, the
only thing the scheduler can do is to run it (DomU).

So, as you say, even with just 1 pCPU available, if the scenario is
like I described above, there would not be the need for any fancy or
advanced improvement in the scheduler. Actually, the scheduler does
very few... It always choose to run the only vCPU that is runnable.

On the other hand if, with still only one pCPU, there are more domains
(and hence more vCPUs) around, doing other things, and/or, if Dom0 runs
some other workload, in addition to the backends for DomU, then indeed
things may get more complicated. For example, at [4], the scheduler may
choose a different vCPU than the one of DomU, and this would probably
be a problem.

What I was saying during the call is that we have a lot of tweaks and
mechanisms already in place to deal with situations like these.

E.g., if you have a decent amount of pCPUs, we can use cpupool, to
isolate, say, stubdomains from regular DomUs, or to isolate DomU-
Stubdom couples. Also, something similar, with a smaller degree of
isolation, but higher flexibility may be achieved with pinning. And,
finally, we can differentiate the domains among each other, within the
same pool or pinning mask, by using weights (and, with Credit2,
starting from 4.10, hopefully, with caps & reservations :-D).

But to try to envision which one would be the best combination of all
these mechanisms , I need the information I've asked about above. :-)

Thanks and Regards,
Dario

PS. It's a bit late here know... So I'll read the other scenario --the
one about copro-- tomorrow. But I can anticipate that I'm going to ask
the same kind of information :-)
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: EL0 app, stubdoms on ARM conf call
  2017-06-29 21:26           ` Dario Faggioli
@ 2017-06-29 22:21             ` Volodymyr Babchuk
  0 siblings, 0 replies; 9+ messages in thread
From: Volodymyr Babchuk @ 2017-06-29 22:21 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov,
	Oleksandr Andrushchenko, George Dunlap, Oleksandr Tyshchenko,
	Julien Grall, xen-devel

Hello Dario,

On 30 June 2017 at 00:26, Dario Faggioli <dario.faggioli@citrix.com> wrote:
> On Thu, 2017-06-29 at 22:04 +0300, Volodymyr Babchuk wrote:
>> Hello all,
>>
> Hello,
>
>> 1. OP-TEE use case: DRM playback (secure data path).
>>
>> User wants to play a DRM-protected media file. Rights holders don't
>> want to give user any means to get DRM-free copy of that media file.
>> If you ever heard about Widevine on Android - that it is. Long story
>> short, it is possible to decrypt, decode and display a video frame in
>> a such way, that decrypted data will never be accessible to
>> userspace,
>> kernel or even to hypervisor. This is possible only when all data
>> processing is done in secure mode, which leads us to OP-TEE or
>> (another TEE).
>> So, for each video frame media player should call OP-TEE with
>> encrypted frame data.
>>
>> Good case: 24FPS movie, optimized data path: media player registers
>> shared buffers in OP-TEE only once and then reuses them during every
>> invocation. That would be one OP-TEE call per frame or 24 calls per
>> second.
>> Worst case: High frame rate movie (60 FPS), data path in not
>> optimized. Media player registers shared buffer in OP-TEE, then asks
>> it to process frame, then unregisters buffer. 60 * 3 = 180 calls per
>> second.
>>
>> Сall is done using SMC instruction. Let's assume that OP-TEE mediator
>> lives in Stubdom. There is how call sequence can look like:
>>
>> 1. DomU issues SMC, which is trapped by Hypervisor
>> 2. Hypervisor uses standard approach with ring buffer and event
>> mechanism to call Stubdom. Also it blocks DomU's vCPU which caused
>> this trap.
>> 3a. Stubdom mangles request and asks Hypervisor to issue real SMC
>> (3b. Stubdom mangles request and issues SMC by itself - potentially
>> insecure)
>> 4. After real SMC, Hypervisor returns control back to Stubdom
>> 5. Stubdom mangles return value and returns response to Hypervisor in
>> a ring buffer
>> 6. Hypervisor unblocks DomU's VCPU and schedules it.
>>
>> As you can see, there are 6 context switches
>> (DomU->HYP->Stubdom->HYP->Stubdom->HYP->DomU). There are 2 VCPU
>> switches (DomU->Stubdom->DomU). Both VCPU switches are governed by a
>> scheduler.
>> When I say "governed by scheduler" I imply that there are no
>> guarantees that needed domain will be scheduled right now.
>> This is sequence for one call. As you remember, there can be up to
>> 180
>> such calls per second in this use case. That gives us 180 * 6 ~= 1000
>> context switches per second.
>>
> Ok. This is a quite detailed, well done, and useful description of the
> specific characteristics of your workflow.

> If possible, though, I'd like to know even more. Specifically, on a
> somewhat typical system:
> - how much pCPUs will you have?
Four on our target platform. Probably, we can crank up four A53 cores,
that will gave us 8 pCPUs in total, and will ease up things. But lets
assume that we have 4 pCPUs for now.

> - how much vCPUs will Dom0 have?
Four for now. Probably it won't need so much. I think 2 will be enough.

> - what would Dom0 be doing (as in, what components of your overall
> platform would be running there), and how busy, at least roughly, do
> you expect it would be?
It runs all hardware drivers and all backends (display, input,
network, block, sound)

> - how many vCPUs will DomU have?
It depends on DomU type. Let's say from 2 to 4.

> - how many vCPUs will Stubdom have? (I'm guessing one, as there's only
> 1 OP-TEE, does that make sense?)
Unfortunately, no. OP-TEE is SMP-capable. Every pCPU can call OP-TEE
at the same time. We want to preserve this feature.

> - how many other domains will there be? How many vCPUs will each one of
> them have?
There will be third domain, which runs background jobs. I think, those
are low-priority jobs (Artem can correct me). But they will require
all computational power, that is left.

>
> I understand it's a lot of questions, but it's quite important to have
> these info, IMO. They don't have to be super-precise and totally match
> the final look and setup of your final product, it "just" have to be a
> representative enough example.
>
> I'll try to explain why I think it would be useful to know all these
> things. So, for instance, in the scenario you describe above, if you:
> - have only 1 pCPUs
> - Dom0 has 1 vCPU, and it runs the standard backeds. Which means,
> unless when DomU is doing either disk or network I/O, it's mostly idle
No. It also does composition for all (2 or 3) displays. Also it plays
sound and so on.

> - DomU has 1 vCPU
One of possible DomUs is Android. You know, that Android is very
hungry for resources. I don't expect that it will run smoothly on one
vCPU.

> - Stubdom has 1 vCPU
> - there's no other domain
>
> What I think will happen most of the time will be something like this:
>
> [1]  DomU runs
>      .
>      .
> [2]  DomU calls SMC
>      Xen blocks DomU
>      Xen wakes Stubdom
> [3]  Stubdom runs, does SMC
>      .
>      SMC done, Stubdom blocks
>      Xen wakes DomU
> [4]  DomU runs
>      .
>      .
>
> At [1], Dom0 and Stubdom are idle, and DomU is the only running domain
> (or, to be precise, vCPU), and so it runs. When, at [2], it calls SMC,
> it also blocks. Therefore, at [3], it's Stubdom that is the only
> runnable domain, and in fact, the scheduler let it run. Finally, at
> [4], since Stubdom has blocked again, while DomU has been woken up, the
> only thing the scheduler can do is to run it (DomU).
>
> So, as you say, even with just 1 pCPU available, if the scenario is
> like I described above, there would not be the need for any fancy or
> advanced improvement in the scheduler. Actually, the scheduler does
> very few... It always choose to run the only vCPU that is runnable.
>
> On the other hand if, with still only one pCPU, there are more domains
> (and hence more vCPUs) around, doing other things, and/or, if Dom0 runs
> some other workload, in addition to the backends for DomU, then indeed
> things may get more complicated. For example, at [4], the scheduler may
> choose a different vCPU than the one of DomU, and this would probably
> be a problem.
Yes, this is what we afraid of.

> What I was saying during the call is that we have a lot of tweaks and
> mechanisms already in place to deal with situations like these.
>
> E.g., if you have a decent amount of pCPUs, we can use cpupool, to
> isolate, say, stubdomains from regular DomUs, or to isolate DomU-
> Stubdom couples. Also, something similar, with a smaller degree of
> isolation, but higher flexibility may be achieved with pinning. And,
> finally, we can differentiate the domains among each other, within the
> same pool or pinning mask, by using weights (and, with Credit2,
> starting from 4.10, hopefully, with caps & reservations :-D).
Yes, I was thinking about weights, and how they can help. There some
experiments should be done.

> But to try to envision which one would be the best combination of all
> these mechanisms , I need the information I've asked about above. :-)
Thank you.  Feel free to ask anything you need.

Also, I want to describe high-level setup:

Imagine that we aiming for next-gen car PC/entertainment system. This
system will have at least two displays:
 - instrument cluster (you know, speedometer, odometer and so on)
 - navigation/entertainment display
 - optional display on rear seat
Multiple input devices (touchscreens, hw keys, joysticks and knobs)
Multiple audio input/output devices, GPS, modem, wifi AP, etc...

There can be different setups - with separate driver domain, with
separate instrument cluster domain, etc. But lets stick to the
simplest one:
Dom0 is working with HW and running app for instrument cluster
display. This is highest priority domain
DomU (Android, AGL, or other OS) is running navigation, plays music,
tweets and so on. This is not so prioritized domain, but users are
accustomed that GUI works smoothly :)
DomBack - Runs some background tasks, collects statistics,
communicates with cloud, etc.
DomOP-TEE - stubdom, acts as OP-TEE mediator
DomGPU - stubdom, runs GPU virtualization driver
DomVID - stubdom, runs video decoder/encoder virtualization driver


-- 
WBR Volodymyr Babchuk aka lorc [+380976646013]
mailto: vlad.babchuk@gmail.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-06-29 22:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-15 18:21 EL0 app, stubdoms on ARM conf call Stefano Stabellini
2017-06-15 19:37 ` Volodymyr Babchuk
2017-06-15 20:14   ` Stefano Stabellini
2017-06-15 20:38     ` Volodymyr Babchuk
2017-06-16  6:00     ` Dario Faggioli
2017-06-16 17:19       ` Stefano Stabellini
2017-06-29 19:04         ` Volodymyr Babchuk
2017-06-29 21:26           ` Dario Faggioli
2017-06-29 22:21             ` Volodymyr Babchuk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.