* Notes on stubdoms and latency on ARM @ 2017-05-18 19:00 Stefano Stabellini 2017-05-19 19:45 ` Volodymyr Babchuk 0 siblings, 1 reply; 49+ messages in thread From: Stefano Stabellini @ 2017-05-18 19:00 UTC (permalink / raw) To: xen-devel Cc: vlad.babchuk, dario.faggioli, sstabellini, julien.grall, george.dunlap Hi all, Julien, Dario, George and I had a quick meeting to discuss stubdom scheduling. These are my notes. Description of the problem: need for a place to run emulators and mediators outside of Xen, with low latency. Explanation of what EL0 apps are. What should be their interface with Xen? Could the interface be the regular hypercall interface? In that case, what's the benefit compared to stubdoms? The problem with stubdoms is latency and scheduling. It is not deterministic. We could easily improve the null scheduler to introduce some sort of non-preemptive scheduling of stubdoms on the same pcpus of the guest vcpus. It would still require manually pinning vcpus to pcpus. Then, we could add a sched_op hypercall to let the schedulers know that a stubdom is tied to a specific guest domain. At that point, the scheduling of stubdoms would become deterministic and automatic with the null scheduler. It could be done to other schedulers too, but it will be more work. The other issue with stubdoms is context switch times. Volodymyr showed that minios has much higher context switch times compared to EL0 apps. It is probably due to GIC context switch, that is skipped for EL0 apps. Maybe we could skip GIC context switch for stubdoms too, if we knew that they are not going to use the VGIC. At that point, context switch times should be very similar to EL0 apps. ACTIONS: Improve the null scheduler to enable decent stubdoms scheduling on latency sensitive systems. Investigate ways to improve context switch times on ARM. Cheers, Stefano _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-18 19:00 Notes on stubdoms and latency on ARM Stefano Stabellini @ 2017-05-19 19:45 ` Volodymyr Babchuk 2017-05-22 21:41 ` Stefano Stabellini ` (2 more replies) 0 siblings, 3 replies; 49+ messages in thread From: Volodymyr Babchuk @ 2017-05-19 19:45 UTC (permalink / raw) To: Stefano Stabellini Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, Dario Faggioli, George Dunlap, Julien Grall Hi Stefano, On 18 May 2017 at 22:00, Stefano Stabellini <sstabellini@kernel.org> wrote: > Description of the problem: need for a place to run emulators and > mediators outside of Xen, with low latency. > > Explanation of what EL0 apps are. What should be their interface with > Xen? Could the interface be the regular hypercall interface? In that > case, what's the benefit compared to stubdoms? I imagined this as separate syscall interface (with finer policy rules). But this can be discussed, of course. > The problem with stubdoms is latency and scheduling. It is not > deterministic. We could easily improve the null scheduler to introduce > some sort of non-preemptive scheduling of stubdoms on the same pcpus of > the guest vcpus. It would still require manually pinning vcpus to pcpus. I see couple of other problems with stubdoms. For example, we need mechanism to load mediator stubdom before dom0. > Then, we could add a sched_op hypercall to let the schedulers know that > a stubdom is tied to a specific guest domain. What if one stubdom serves multiple domains? This is TEE use case. > The other issue with stubdoms is context switch times. Volodymyr showed > that minios has much higher context switch times compared to EL0 apps. > It is probably due to GIC context switch, that is skipped for EL0 apps. > Maybe we could skip GIC context switch for stubdoms too, if we knew that > they are not going to use the VGIC. At that point, context switch times > should be very similar to EL0 apps. So you are suggesting to create something like lightweight stubdom. I generally like this idea. But AFAIK, vGIC is used to deliver events from hypervisor to stubdom. Do you want to propose another mechanism? Also, this is sounds much like my EL0 PoC :) > ACTIONS: > Improve the null scheduler to enable decent stubdoms scheduling on > latency sensitive systems. I'm not very familiar with XEN schedulers. Looks like null scheduler is good for hard RT, but isn't fine for a generic consumer system. How do you think: is it possible to make credit2 scheduler to schedule stubdoms in the same way? > Investigate ways to improve context switch times on ARM. Do you have any tools to profile or trace XEN core? Also, I don't think that pure context switch time is the biggest issue. Even now, it allows 180 000 switches per second (if I'm not wrong). I think, scheduling latency is more important. -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-19 19:45 ` Volodymyr Babchuk @ 2017-05-22 21:41 ` Stefano Stabellini 2017-05-26 19:28 ` Volodymyr Babchuk 2017-05-23 7:11 ` Dario Faggioli 2017-05-23 9:08 ` George Dunlap 2 siblings, 1 reply; 49+ messages in thread From: Stefano Stabellini @ 2017-05-22 21:41 UTC (permalink / raw) To: Volodymyr Babchuk Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, Dario Faggioli, George Dunlap, Julien Grall, xen-devel On Fri, 19 May 2017, Volodymyr Babchuk wrote: > On 18 May 2017 at 22:00, Stefano Stabellini <sstabellini@kernel.org> wrote: > > > Description of the problem: need for a place to run emulators and > > mediators outside of Xen, with low latency. > > > > Explanation of what EL0 apps are. What should be their interface with > > Xen? Could the interface be the regular hypercall interface? In that > > case, what's the benefit compared to stubdoms? > I imagined this as separate syscall interface (with finer policy > rules). But this can be discussed, of course. Right, and to be clear, I am not against EL0 apps. > > The problem with stubdoms is latency and scheduling. It is not > > deterministic. We could easily improve the null scheduler to introduce > > some sort of non-preemptive scheduling of stubdoms on the same pcpus of > > the guest vcpus. It would still require manually pinning vcpus to pcpus. > I see couple of other problems with stubdoms. For example, we need > mechanism to load mediator stubdom before dom0. This can be solved: unrelated to this discussion, I had already created a project for Outreachy/GSoC to create multiple guests from device tree. https://wiki.xenproject.org/wiki/Outreach_Program_Projects#Xen_on_ARM:_create_multiple_guests_from_device_tree > > Then, we could add a sched_op hypercall to let the schedulers know that > > a stubdom is tied to a specific guest domain. > What if one stubdom serves multiple domains? This is TEE use case. It can be done. Stubdoms are typically deployed one per domain but they are not limited to that model. > > The other issue with stubdoms is context switch times. Volodymyr showed > > that minios has much higher context switch times compared to EL0 apps. > > It is probably due to GIC context switch, that is skipped for EL0 apps. > > Maybe we could skip GIC context switch for stubdoms too, if we knew that > > they are not going to use the VGIC. At that point, context switch times > > should be very similar to EL0 apps. > So you are suggesting to create something like lightweight stubdom. I > generally like this idea. But AFAIK, vGIC is used to deliver events > from hypervisor to stubdom. Do you want to propose another mechanism? There is no way out: if the stubdom needs events, then we'll have to expose and context switch the vGIC. If it doesn't, then we can skip the vGIC. However, we would have a similar problem with EL0 apps: I am assuming that EL0 apps don't need to handle interrupts, but if they do, then they might need something like a vGIC. > Also, this is sounds much like my EL0 PoC :) Yes :-) > > ACTIONS: > > Improve the null scheduler to enable decent stubdoms scheduling on > > latency sensitive systems. > I'm not very familiar with XEN schedulers. Looks like null scheduler > is good for hard RT, but isn't fine for a generic consumer system. How > do you think: is it possible to make credit2 scheduler to schedule > stubdoms in the same way? You can do more than that :-) You can use credit2 and the null scheduler simultaneously on different sets of physical cpus using cpupools. For example, you can use the null scheduler on 2 physical cores and credit2 on the remaining cores. To better answer your question, yes it can be done with credit2 too, however it will obviously be more work (the null scheduler is trivial). > > Investigate ways to improve context switch times on ARM. > Do you have any tools to profile or trace XEN core? Also, I don't > think that pure context switch time is the biggest issue. Even now, it > allows 180 000 switches per second (if I'm not wrong). I think, > scheduling latency is more important. I am using the arch timer, manually reading the counter values. I know it's not ideal but it does the job. I am sure that with a combination of null scheduler and vcpu pinning the scheduling latencies can extremely reduced. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-22 21:41 ` Stefano Stabellini @ 2017-05-26 19:28 ` Volodymyr Babchuk 2017-05-30 17:29 ` Stefano Stabellini 2017-05-31 17:02 ` George Dunlap 0 siblings, 2 replies; 49+ messages in thread From: Volodymyr Babchuk @ 2017-05-26 19:28 UTC (permalink / raw) To: Stefano Stabellini Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, Dario Faggioli, George Dunlap, Julien Grall Hello Stefano, >> > The problem with stubdoms is latency and scheduling. It is not >> > deterministic. We could easily improve the null scheduler to introduce >> > some sort of non-preemptive scheduling of stubdoms on the same pcpus of >> > the guest vcpus. It would still require manually pinning vcpus to pcpus. >> I see couple of other problems with stubdoms. For example, we need >> mechanism to load mediator stubdom before dom0. > > This can be solved: unrelated to this discussion, I had already created a > project for Outreachy/GSoC to create multiple guests from device tree. > > https://wiki.xenproject.org/wiki/Outreach_Program_Projects#Xen_on_ARM:_create_multiple_guests_from_device_tree Yes, that could be a solution. >> > The other issue with stubdoms is context switch times. Volodymyr showed >> > that minios has much higher context switch times compared to EL0 apps. >> > It is probably due to GIC context switch, that is skipped for EL0 apps. >> > Maybe we could skip GIC context switch for stubdoms too, if we knew that >> > they are not going to use the VGIC. At that point, context switch times >> > should be very similar to EL0 apps. >> So you are suggesting to create something like lightweight stubdom. I >> generally like this idea. But AFAIK, vGIC is used to deliver events >> from hypervisor to stubdom. Do you want to propose another mechanism? > > There is no way out: if the stubdom needs events, then we'll have to > expose and context switch the vGIC. If it doesn't, then we can skip the > vGIC. However, we would have a similar problem with EL0 apps: I am > assuming that EL0 apps don't need to handle interrupts, but if they do, > then they might need something like a vGIC. Hm. Correct me, but if we want make stubdom to handle some requests (e.g. emulate MMIO access), then it needs events, and thus it needs interrupts. At least, I'm not aware about any other mechanism, that allows hypervisor to signal to a domain. On other hand, EL0 app (as I see them) does not need such events. Basically, you just call function `handle_mmio()` right in the app. So, apps can live without interrupts and they still be able to handle request. >> I'm not very familiar with XEN schedulers. Looks like null scheduler >> is good for hard RT, but isn't fine for a generic consumer system. How >> do you think: is it possible to make credit2 scheduler to schedule >> stubdoms in the same way? > > You can do more than that :-) > You can use credit2 and the null scheduler simultaneously on different > sets of physical cpus using cpupools. For example, you can use the null > scheduler on 2 physical cores and credit2 on the remaining cores. Wow. Didn't know that. -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-26 19:28 ` Volodymyr Babchuk @ 2017-05-30 17:29 ` Stefano Stabellini 2017-05-30 17:33 ` Julien Grall 2017-05-31 9:09 ` George Dunlap 2017-05-31 17:02 ` George Dunlap 1 sibling, 2 replies; 49+ messages in thread From: Stefano Stabellini @ 2017-05-30 17:29 UTC (permalink / raw) To: Volodymyr Babchuk Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, Dario Faggioli, George Dunlap, Julien Grall, xen-devel On Fri, 26 May 2017, Volodymyr Babchuk wrote: > >> > The other issue with stubdoms is context switch times. Volodymyr showed > >> > that minios has much higher context switch times compared to EL0 apps. > >> > It is probably due to GIC context switch, that is skipped for EL0 apps. > >> > Maybe we could skip GIC context switch for stubdoms too, if we knew that > >> > they are not going to use the VGIC. At that point, context switch times > >> > should be very similar to EL0 apps. > >> So you are suggesting to create something like lightweight stubdom. I > >> generally like this idea. But AFAIK, vGIC is used to deliver events > >> from hypervisor to stubdom. Do you want to propose another mechanism? > > > > There is no way out: if the stubdom needs events, then we'll have to > > expose and context switch the vGIC. If it doesn't, then we can skip the > > vGIC. However, we would have a similar problem with EL0 apps: I am > > assuming that EL0 apps don't need to handle interrupts, but if they do, > > then they might need something like a vGIC. > Hm. Correct me, but if we want make stubdom to handle some requests > (e.g. emulate MMIO access), then it needs events, and thus it needs > interrupts. At least, I'm not aware about any other mechanism, that > allows hypervisor to signal to a domain. The stubdom could do polling and avoid interrupts for example, but that would probably not be desirable. > On other hand, EL0 app (as I see them) does not need such events. > Basically, you just call function `handle_mmio()` right in the app. > So, apps can live without interrupts and they still be able to handle > request. That's true. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-30 17:29 ` Stefano Stabellini @ 2017-05-30 17:33 ` Julien Grall 2017-06-01 10:28 ` Julien Grall 2017-05-31 9:09 ` George Dunlap 1 sibling, 1 reply; 49+ messages in thread From: Julien Grall @ 2017-05-30 17:33 UTC (permalink / raw) To: Stefano Stabellini, Volodymyr Babchuk Cc: Artem_Mygaiev, Dario Faggioli, xen-devel, Andrii Anisov, George Dunlap On 30/05/17 18:29, Stefano Stabellini wrote: > On Fri, 26 May 2017, Volodymyr Babchuk wrote: >>>>> The other issue with stubdoms is context switch times. Volodymyr showed >>>>> that minios has much higher context switch times compared to EL0 apps. >>>>> It is probably due to GIC context switch, that is skipped for EL0 apps. >>>>> Maybe we could skip GIC context switch for stubdoms too, if we knew that >>>>> they are not going to use the VGIC. At that point, context switch times >>>>> should be very similar to EL0 apps. >>>> So you are suggesting to create something like lightweight stubdom. I >>>> generally like this idea. But AFAIK, vGIC is used to deliver events >>>> from hypervisor to stubdom. Do you want to propose another mechanism? >>> >>> There is no way out: if the stubdom needs events, then we'll have to >>> expose and context switch the vGIC. If it doesn't, then we can skip the >>> vGIC. However, we would have a similar problem with EL0 apps: I am >>> assuming that EL0 apps don't need to handle interrupts, but if they do, >>> then they might need something like a vGIC. >> Hm. Correct me, but if we want make stubdom to handle some requests >> (e.g. emulate MMIO access), then it needs events, and thus it needs >> interrupts. At least, I'm not aware about any other mechanism, that >> allows hypervisor to signal to a domain. > > The stubdom could do polling and avoid interrupts for example, but that > would probably not be desirable. The polling can be minimized if you block the vCPU when there are nothing to do. It would get unblock when you have to schedule him because of a request. > > >> On other hand, EL0 app (as I see them) does not need such events. >> Basically, you just call function `handle_mmio()` right in the app. >> So, apps can live without interrupts and they still be able to handle >> request. > > That's true. > Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-30 17:33 ` Julien Grall @ 2017-06-01 10:28 ` Julien Grall 2017-06-17 0:17 ` Volodymyr Babchuk 0 siblings, 1 reply; 49+ messages in thread From: Julien Grall @ 2017-06-01 10:28 UTC (permalink / raw) To: Stefano Stabellini, Volodymyr Babchuk Cc: Artem_Mygaiev, Dario Faggioli, xen-devel, Andrii Anisov, George Dunlap Hi, On 30/05/17 18:33, Julien Grall wrote: > > > On 30/05/17 18:29, Stefano Stabellini wrote: >> On Fri, 26 May 2017, Volodymyr Babchuk wrote: >>>>>> The other issue with stubdoms is context switch times. Volodymyr >>>>>> showed >>>>>> that minios has much higher context switch times compared to EL0 >>>>>> apps. >>>>>> It is probably due to GIC context switch, that is skipped for EL0 >>>>>> apps. >>>>>> Maybe we could skip GIC context switch for stubdoms too, if we >>>>>> knew that >>>>>> they are not going to use the VGIC. At that point, context switch >>>>>> times >>>>>> should be very similar to EL0 apps. >>>>> So you are suggesting to create something like lightweight stubdom. I >>>>> generally like this idea. But AFAIK, vGIC is used to deliver events >>>>> from hypervisor to stubdom. Do you want to propose another mechanism? >>>> >>>> There is no way out: if the stubdom needs events, then we'll have to >>>> expose and context switch the vGIC. If it doesn't, then we can skip the >>>> vGIC. However, we would have a similar problem with EL0 apps: I am >>>> assuming that EL0 apps don't need to handle interrupts, but if they do, >>>> then they might need something like a vGIC. >>> Hm. Correct me, but if we want make stubdom to handle some requests >>> (e.g. emulate MMIO access), then it needs events, and thus it needs >>> interrupts. At least, I'm not aware about any other mechanism, that >>> allows hypervisor to signal to a domain. >> >> The stubdom could do polling and avoid interrupts for example, but that >> would probably not be desirable. > > The polling can be minimized if you block the vCPU when there are > nothing to do. It would get unblock when you have to schedule him > because of a request. Thinking a bit more about this. So far, we rely on the domain to use the vGIC interrupt controller which require the context switch. We could also implement a dummy interrupt controller to handle a predefined limited amount of interrupts which would allow asynchronous support in stubdom and an interface to support upcall via the interrupt exception vector. This is something that would be more tricky to do with EL0 app as there is no EL0 vector exception. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-01 10:28 ` Julien Grall @ 2017-06-17 0:17 ` Volodymyr Babchuk 0 siblings, 0 replies; 49+ messages in thread From: Volodymyr Babchuk @ 2017-06-17 0:17 UTC (permalink / raw) To: Julien Grall Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, Dario Faggioli, George Dunlap, xen-devel Hello Juilen, >> The polling can be minimized if you block the vCPU when there are >> nothing to do. It would get unblock when you have to schedule him >> because of a request. > Thinking a bit more about this. So far, we rely on the domain to use the > vGIC interrupt controller which require the context switch. > > We could also implement a dummy interrupt controller to handle a predefined > limited amount of interrupts which would allow asynchronous support in > stubdom and an interface to support upcall via the interrupt exception > vector. > > This is something that would be more tricky to do with EL0 app as there is > no EL0 vector exception. > Actually, your idea about blocking vcpu is very interesting. Then we don't need vGIC at all. For example, when stubdomain have finished handling request, it can issue hypercall "block me until new requests". XEN blocks vcpu at this moment and unblocks it only when there are another request ready. This is very promising idea. Need to think about it further. -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-30 17:29 ` Stefano Stabellini 2017-05-30 17:33 ` Julien Grall @ 2017-05-31 9:09 ` George Dunlap 2017-05-31 15:53 ` Dario Faggioli 2017-05-31 17:45 ` Stefano Stabellini 1 sibling, 2 replies; 49+ messages in thread From: George Dunlap @ 2017-05-31 9:09 UTC (permalink / raw) To: Stefano Stabellini, Volodymyr Babchuk Cc: Artem_Mygaiev, Dario Faggioli, xen-devel, Andrii Anisov, Julien Grall On 30/05/17 18:29, Stefano Stabellini wrote: > On Fri, 26 May 2017, Volodymyr Babchuk wrote: >>>>> The other issue with stubdoms is context switch times. Volodymyr showed >>>>> that minios has much higher context switch times compared to EL0 apps. >>>>> It is probably due to GIC context switch, that is skipped for EL0 apps. >>>>> Maybe we could skip GIC context switch for stubdoms too, if we knew that >>>>> they are not going to use the VGIC. At that point, context switch times >>>>> should be very similar to EL0 apps. >>>> So you are suggesting to create something like lightweight stubdom. I >>>> generally like this idea. But AFAIK, vGIC is used to deliver events >>>> from hypervisor to stubdom. Do you want to propose another mechanism? >>> >>> There is no way out: if the stubdom needs events, then we'll have to >>> expose and context switch the vGIC. If it doesn't, then we can skip the >>> vGIC. However, we would have a similar problem with EL0 apps: I am >>> assuming that EL0 apps don't need to handle interrupts, but if they do, >>> then they might need something like a vGIC. >> Hm. Correct me, but if we want make stubdom to handle some requests >> (e.g. emulate MMIO access), then it needs events, and thus it needs >> interrupts. At least, I'm not aware about any other mechanism, that >> allows hypervisor to signal to a domain. > > The stubdom could do polling and avoid interrupts for example, but that > would probably not be desirable. > > >> On other hand, EL0 app (as I see them) does not need such events. >> Basically, you just call function `handle_mmio()` right in the app. >> So, apps can live without interrupts and they still be able to handle >> request. > > That's true. Well if they're in a separate security zone, that's not going to work. You have to have a defined interface between things and sanitize inputs between them. Furthermore, you probably want something like a stable interface with some level of backwards compatibility, which is not something the internal hypervisor interfaces are designed for. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-31 9:09 ` George Dunlap @ 2017-05-31 15:53 ` Dario Faggioli 2017-05-31 16:17 ` Volodymyr Babchuk 2017-05-31 17:45 ` Stefano Stabellini 1 sibling, 1 reply; 49+ messages in thread From: Dario Faggioli @ 2017-05-31 15:53 UTC (permalink / raw) To: George Dunlap, Stefano Stabellini, Volodymyr Babchuk Cc: Artem_Mygaiev, Julien Grall, xen-devel, Andrii Anisov [-- Attachment #1.1: Type: text/plain, Size: 1281 bytes --] On Wed, 2017-05-31 at 10:09 +0100, George Dunlap wrote: > On 30/05/17 18:29, Stefano Stabellini wrote: > > On Fri, 26 May 2017, Volodymyr Babchuk wrote: > > > On other hand, EL0 app (as I see them) does not need such events. > > > Basically, you just call function `handle_mmio()` right in the > > > app. > > > So, apps can live without interrupts and they still be able to > > > handle > > > request. > > > > That's true. > > Well if they're in a separate security zone, that's not going to > work. > You have to have a defined interface between things and sanitize > inputs > between them. > Exactly, I was about to ask almost the same thing. In fact, if you are "not" in Xen, as in, you are (and want to be there by design) in an entity that is scheduled by Xen, and runs at a different privilege level than Xen code, how come you can just call random hypervisor functions? Or am I still missing something (of either ARM in general, or of these Apps in particular)? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-31 15:53 ` Dario Faggioli @ 2017-05-31 16:17 ` Volodymyr Babchuk 0 siblings, 0 replies; 49+ messages in thread From: Volodymyr Babchuk @ 2017-05-31 16:17 UTC (permalink / raw) To: Dario Faggioli Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, George Dunlap, Julien Grall, Stefano Stabellini Hi Dario, >> > > On other hand, EL0 app (as I see them) does not need such events. >> > > Basically, you just call function `handle_mmio()` right in the >> > > app. >> > > So, apps can live without interrupts and they still be able to >> > > handle >> > > request. >> > >> > That's true. >> >> Well if they're in a separate security zone, that's not going to >> work. >> You have to have a defined interface between things and sanitize >> inputs >> between them. >> > Exactly, I was about to ask almost the same thing. > > In fact, if you are "not" in Xen, as in, you are (and want to be there > by design) in an entity that is scheduled by Xen, and runs at a > different privilege level than Xen code, how come you can just call > random hypervisor functions? It is impossible, indeed. As I said earlier, interface between app and hypervisor would be similar to hypercall interface (or it would be hypercall interface itself). ARM provides native interface for syscalls in hypervisor mode. That means, that if you wish, you can handle both hypercalls (as a hypervisor) and syscalls (as an "OS" for apps). -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-31 9:09 ` George Dunlap 2017-05-31 15:53 ` Dario Faggioli @ 2017-05-31 17:45 ` Stefano Stabellini 2017-06-01 10:48 ` Julien Grall 2017-06-01 10:52 ` George Dunlap 1 sibling, 2 replies; 49+ messages in thread From: Stefano Stabellini @ 2017-05-31 17:45 UTC (permalink / raw) To: George Dunlap Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, Volodymyr Babchuk, Dario Faggioli, Julien Grall, xen-devel On Wed, 31 May 2017, George Dunlap wrote: > On 30/05/17 18:29, Stefano Stabellini wrote: > > On Fri, 26 May 2017, Volodymyr Babchuk wrote: > >>>>> The other issue with stubdoms is context switch times. Volodymyr showed > >>>>> that minios has much higher context switch times compared to EL0 apps. > >>>>> It is probably due to GIC context switch, that is skipped for EL0 apps. > >>>>> Maybe we could skip GIC context switch for stubdoms too, if we knew that > >>>>> they are not going to use the VGIC. At that point, context switch times > >>>>> should be very similar to EL0 apps. > >>>> So you are suggesting to create something like lightweight stubdom. I > >>>> generally like this idea. But AFAIK, vGIC is used to deliver events > >>>> from hypervisor to stubdom. Do you want to propose another mechanism? > >>> > >>> There is no way out: if the stubdom needs events, then we'll have to > >>> expose and context switch the vGIC. If it doesn't, then we can skip the > >>> vGIC. However, we would have a similar problem with EL0 apps: I am > >>> assuming that EL0 apps don't need to handle interrupts, but if they do, > >>> then they might need something like a vGIC. > >> Hm. Correct me, but if we want make stubdom to handle some requests > >> (e.g. emulate MMIO access), then it needs events, and thus it needs > >> interrupts. At least, I'm not aware about any other mechanism, that > >> allows hypervisor to signal to a domain. > > > > The stubdom could do polling and avoid interrupts for example, but that > > would probably not be desirable. > > > > > >> On other hand, EL0 app (as I see them) does not need such events. > >> Basically, you just call function `handle_mmio()` right in the app. > >> So, apps can live without interrupts and they still be able to handle > >> request. > > > > That's true. > > Well if they're in a separate security zone, that's not going to work. > You have to have a defined interface between things and sanitize inputs > between them. Why? The purpose of EL0 apps is not to do checks on VM traps in Xen but in a different privilege level instead. Maybe I misunderstood what you are saying? Specifically, what "inputs" do you think should be sanitized in Xen before jumping into the EL0 app? > Furthermore, you probably want something like a stable > interface with some level of backwards compatibility, which is not > something the internal hypervisor interfaces are designed for. I don't think we should provide that. If the user wants a stable interface, she can use domains. I suggested that the code for the EL0 app should come out of the Xen repository directly. Like for the Xen tools, they would be expected to be always in-sync. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-31 17:45 ` Stefano Stabellini @ 2017-06-01 10:48 ` Julien Grall 2017-06-01 10:52 ` George Dunlap 1 sibling, 0 replies; 49+ messages in thread From: Julien Grall @ 2017-06-01 10:48 UTC (permalink / raw) To: Stefano Stabellini, George Dunlap Cc: Volodymyr Babchuk, Artem_Mygaiev, Dario Faggioli, xen-devel, Andrii Anisov Hi Stefano, On 31/05/17 18:45, Stefano Stabellini wrote: > On Wed, 31 May 2017, George Dunlap wrote: >> On 30/05/17 18:29, Stefano Stabellini wrote: >>> On Fri, 26 May 2017, Volodymyr Babchuk wrote: >>>>>>> The other issue with stubdoms is context switch times. Volodymyr showed >>>>>>> that minios has much higher context switch times compared to EL0 apps. >>>>>>> It is probably due to GIC context switch, that is skipped for EL0 apps. >>>>>>> Maybe we could skip GIC context switch for stubdoms too, if we knew that >>>>>>> they are not going to use the VGIC. At that point, context switch times >>>>>>> should be very similar to EL0 apps. >>>>>> So you are suggesting to create something like lightweight stubdom. I >>>>>> generally like this idea. But AFAIK, vGIC is used to deliver events >>>>>> from hypervisor to stubdom. Do you want to propose another mechanism? >>>>> >>>>> There is no way out: if the stubdom needs events, then we'll have to >>>>> expose and context switch the vGIC. If it doesn't, then we can skip the >>>>> vGIC. However, we would have a similar problem with EL0 apps: I am >>>>> assuming that EL0 apps don't need to handle interrupts, but if they do, >>>>> then they might need something like a vGIC. >>>> Hm. Correct me, but if we want make stubdom to handle some requests >>>> (e.g. emulate MMIO access), then it needs events, and thus it needs >>>> interrupts. At least, I'm not aware about any other mechanism, that >>>> allows hypervisor to signal to a domain. >>> >>> The stubdom could do polling and avoid interrupts for example, but that >>> would probably not be desirable. >>> >>> >>>> On other hand, EL0 app (as I see them) does not need such events. >>>> Basically, you just call function `handle_mmio()` right in the app. >>>> So, apps can live without interrupts and they still be able to handle >>>> request. >>> >>> That's true. >> >> Well if they're in a separate security zone, that's not going to work. >> You have to have a defined interface between things and sanitize inputs >> between them. > > Why? The purpose of EL0 apps is not to do checks on VM traps in Xen but > in a different privilege level instead. Maybe I misunderstood what you > are saying? Specifically, what "inputs" do you think should be sanitized > in Xen before jumping into the EL0 app? > > >> Furthermore, you probably want something like a stable >> interface with some level of backwards compatibility, which is not >> something the internal hypervisor interfaces are designed for. > > I don't think we should provide that. If the user wants a stable > interface, she can use domains. I suggested that the code for the EL0 > app should come out of the Xen repository directly. Like for the Xen > tools, they would be expected to be always in-sync. Realistically, even if the EL0 apps are available directly in Xen repository, they will be built as standalone binary. So any ABI change will require to inspect/testing all the EL0 apps if the change is subtle. So This sounds like to me a waste of time and resource compare to providing a stable and clearly defined ABI. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-31 17:45 ` Stefano Stabellini 2017-06-01 10:48 ` Julien Grall @ 2017-06-01 10:52 ` George Dunlap 2017-06-01 10:54 ` George Dunlap ` (2 more replies) 1 sibling, 3 replies; 49+ messages in thread From: George Dunlap @ 2017-06-01 10:52 UTC (permalink / raw) To: Stefano Stabellini Cc: Artem_Mygaiev@epam.com, xen-devel@lists.xensource.com, Andrii Anisov, Volodymyr Babchuk, Dario Faggioli, Julien Grall > On May 31, 2017, at 6:45 PM, Stefano Stabellini <sstabellini@kernel.org> wrote: > > On Wed, 31 May 2017, George Dunlap wrote: >> On 30/05/17 18:29, Stefano Stabellini wrote: >>> On Fri, 26 May 2017, Volodymyr Babchuk wrote: >>>>>>> The other issue with stubdoms is context switch times. Volodymyr showed >>>>>>> that minios has much higher context switch times compared to EL0 apps. >>>>>>> It is probably due to GIC context switch, that is skipped for EL0 apps. >>>>>>> Maybe we could skip GIC context switch for stubdoms too, if we knew that >>>>>>> they are not going to use the VGIC. At that point, context switch times >>>>>>> should be very similar to EL0 apps. >>>>>> So you are suggesting to create something like lightweight stubdom. I >>>>>> generally like this idea. But AFAIK, vGIC is used to deliver events >>>>>> from hypervisor to stubdom. Do you want to propose another mechanism? >>>>> >>>>> There is no way out: if the stubdom needs events, then we'll have to >>>>> expose and context switch the vGIC. If it doesn't, then we can skip the >>>>> vGIC. However, we would have a similar problem with EL0 apps: I am >>>>> assuming that EL0 apps don't need to handle interrupts, but if they do, >>>>> then they might need something like a vGIC. >>>> Hm. Correct me, but if we want make stubdom to handle some requests >>>> (e.g. emulate MMIO access), then it needs events, and thus it needs >>>> interrupts. At least, I'm not aware about any other mechanism, that >>>> allows hypervisor to signal to a domain. >>> >>> The stubdom could do polling and avoid interrupts for example, but that >>> would probably not be desirable. >>> >>> >>>> On other hand, EL0 app (as I see them) does not need such events. >>>> Basically, you just call function `handle_mmio()` right in the app. >>>> So, apps can live without interrupts and they still be able to handle >>>> request. >>> >>> That's true. >> >> Well if they're in a separate security zone, that's not going to work. >> You have to have a defined interface between things and sanitize inputs >> between them. > > Why? The purpose of EL0 apps is not to do checks on VM traps in Xen but > in a different privilege level instead. Maybe I misunderstood what you > are saying? Specifically, what "inputs" do you think should be sanitized > in Xen before jumping into the EL0 app? >> Furthermore, you probably want something like a stable >> interface with some level of backwards compatibility, which is not >> something the internal hypervisor interfaces are designed for. > > I don't think we should provide that. If the user wants a stable > interface, she can use domains. I suggested that the code for the EL0 > app should come out of the Xen repository directly. Like for the Xen > tools, they would be expected to be always in-sync. Hmm, it sounds like perhaps I misunderstood you and Volodymyr. I took “you just call function `handle_mmio()` right in the app” to mean that the *app* calls the *hypervisor* function named “handle_mmio”. It sounds like what he (or at least you) actually meant was that the *hypervisor* calls the function named “handle_mmio” in the *app*? But presumably the app will need to do privileged operations — change the guest’s state, read / write MMIO regions, &c. We can theoretically have Xen ‘just call functions’ in the app; but we definitely *cannot* have the app ‘just call functions’ inside of Xen — that is, not if you actually want any additional security. And that’s completely apart from the whole non-GPL discussion we had. If you want non-GPL apps, I think you definitely want a nice clean interface, or you’ll have a hard time arguing that the resulting thing is not a derived work (in spite of the separate address spaces). The two motivating factors for having apps were additional security and non-GPL implementations of device models / mediators. Having the app being able to call into Xen undermines both. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-01 10:52 ` George Dunlap @ 2017-06-01 10:54 ` George Dunlap 2017-06-01 12:40 ` Dario Faggioli 2017-06-01 18:27 ` Stefano Stabellini 2 siblings, 0 replies; 49+ messages in thread From: George Dunlap @ 2017-06-01 10:54 UTC (permalink / raw) To: Stefano Stabellini Cc: Artem_Mygaiev@epam.com, xen-devel@lists.xensource.com, Andrii Anisov, Volodymyr Babchuk, Dario Faggioli, Julien Grall > On Jun 1, 2017, at 11:52 AM, George Dunlap <george.dunlap@citrix.com> wrote: > > >> On May 31, 2017, at 6:45 PM, Stefano Stabellini <sstabellini@kernel.org> wrote: >> >> On Wed, 31 May 2017, George Dunlap wrote: >>> On 30/05/17 18:29, Stefano Stabellini wrote: >>>> On Fri, 26 May 2017, Volodymyr Babchuk wrote: >>>>>>>> The other issue with stubdoms is context switch times. Volodymyr showed >>>>>>>> that minios has much higher context switch times compared to EL0 apps. >>>>>>>> It is probably due to GIC context switch, that is skipped for EL0 apps. >>>>>>>> Maybe we could skip GIC context switch for stubdoms too, if we knew that >>>>>>>> they are not going to use the VGIC. At that point, context switch times >>>>>>>> should be very similar to EL0 apps. >>>>>>> So you are suggesting to create something like lightweight stubdom. I >>>>>>> generally like this idea. But AFAIK, vGIC is used to deliver events >>>>>>> from hypervisor to stubdom. Do you want to propose another mechanism? >>>>>> >>>>>> There is no way out: if the stubdom needs events, then we'll have to >>>>>> expose and context switch the vGIC. If it doesn't, then we can skip the >>>>>> vGIC. However, we would have a similar problem with EL0 apps: I am >>>>>> assuming that EL0 apps don't need to handle interrupts, but if they do, >>>>>> then they might need something like a vGIC. >>>>> Hm. Correct me, but if we want make stubdom to handle some requests >>>>> (e.g. emulate MMIO access), then it needs events, and thus it needs >>>>> interrupts. At least, I'm not aware about any other mechanism, that >>>>> allows hypervisor to signal to a domain. >>>> >>>> The stubdom could do polling and avoid interrupts for example, but that >>>> would probably not be desirable. >>>> >>>> >>>>> On other hand, EL0 app (as I see them) does not need such events. >>>>> Basically, you just call function `handle_mmio()` right in the app. >>>>> So, apps can live without interrupts and they still be able to handle >>>>> request. >>>> >>>> That's true. >>> >>> Well if they're in a separate security zone, that's not going to work. >>> You have to have a defined interface between things and sanitize inputs >>> between them. >> >> Why? The purpose of EL0 apps is not to do checks on VM traps in Xen but >> in a different privilege level instead. Maybe I misunderstood what you >> are saying? Specifically, what "inputs" do you think should be sanitized >> in Xen before jumping into the EL0 app? > >>> Furthermore, you probably want something like a stable >>> interface with some level of backwards compatibility, which is not >>> something the internal hypervisor interfaces are designed for. >> >> I don't think we should provide that. If the user wants a stable >> interface, she can use domains. I suggested that the code for the EL0 >> app should come out of the Xen repository directly. Like for the Xen >> tools, they would be expected to be always in-sync. > > Hmm, it sounds like perhaps I misunderstood you and Volodymyr. I took “you just call function `handle_mmio()` right in the app” to mean that the *app* calls the *hypervisor* function named “handle_mmio”. It sounds like what he (or at least you) actually meant was that the *hypervisor* calls the function named “handle_mmio” in the *app*? > > But presumably the app will need to do privileged operations — change the guest’s state, read / write MMIO regions, &c. We can theoretically have Xen ‘just call functions’ in the app; but we definitely *cannot* have the app ‘just call functions’ inside of Xen — that is, not if you actually want any additional security. > > And that’s completely apart from the whole non-GPL discussion we had. If you want non-GPL apps, I think you definitely want a nice clean interface, or you’ll have a hard time arguing that the resulting thing is not a derived work (in spite of the separate address spaces). > > The two motivating factors for having apps were additional security and non-GPL implementations of device models / mediators. Having the app being able to call into Xen undermines both. And here I mean, “call Xen functions directly”, not “make well-defined hypercalls”. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-01 10:52 ` George Dunlap 2017-06-01 10:54 ` George Dunlap @ 2017-06-01 12:40 ` Dario Faggioli 2017-06-01 15:02 ` George Dunlap 2017-06-01 18:27 ` Stefano Stabellini 2 siblings, 1 reply; 49+ messages in thread From: Dario Faggioli @ 2017-06-01 12:40 UTC (permalink / raw) To: George Dunlap, Stefano Stabellini Cc: Volodymyr Babchuk, Artem_Mygaiev@epam.com, Julien Grall, xen-devel@lists.xensource.com, Andrii Anisov [-- Attachment #1.1: Type: text/plain, Size: 1383 bytes --] On Thu, 2017-06-01 at 12:52 +0200, George Dunlap wrote: > > On May 31, 2017, at 6:45 PM, Stefano Stabellini <sstabellini@kernel > > .org> wrote: > > > > I don't think we should provide that. If the user wants a stable > > interface, she can use domains. I suggested that the code for the > > EL0 > > app should come out of the Xen repository directly. Like for the > > Xen > > tools, they would be expected to be always in-sync. > > Hmm, it sounds like perhaps I misunderstood you and Volodymyr. I > took “you just call function `handle_mmio()` right in the app” to > mean that the *app* calls the *hypervisor* function named > “handle_mmio”. > Right. That what I had understood too. > It sounds like what he (or at least you) actually meant was that the > *hypervisor* calls the function named “handle_mmio” in the *app*? > Mmm... it's clearly me that am being dense, but what do you exactly mean with "the hypervisor calls the function named handle_mmio() in the app"? In particular the "in the app" part, and how is the hypervisor going to be "in" the app... Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-01 12:40 ` Dario Faggioli @ 2017-06-01 15:02 ` George Dunlap 0 siblings, 0 replies; 49+ messages in thread From: George Dunlap @ 2017-06-01 15:02 UTC (permalink / raw) To: Dario Faggioli, Stefano Stabellini Cc: Volodymyr Babchuk, Artem_Mygaiev@epam.com, Julien Grall, xen-devel@lists.xensource.com, Andrii Anisov On 01/06/17 13:40, Dario Faggioli wrote: > On Thu, 2017-06-01 at 12:52 +0200, George Dunlap wrote: >>> On May 31, 2017, at 6:45 PM, Stefano Stabellini <sstabellini@kernel >>> .org> wrote: >>> >>> I don't think we should provide that. If the user wants a stable >>> interface, she can use domains. I suggested that the code for the >>> EL0 >>> app should come out of the Xen repository directly. Like for the >>> Xen >>> tools, they would be expected to be always in-sync. >> >> Hmm, it sounds like perhaps I misunderstood you and Volodymyr. I >> took “you just call function `handle_mmio()` right in the app” to >> mean that the *app* calls the *hypervisor* function named >> “handle_mmio”. >> > Right. That what I had understood too. > >> It sounds like what he (or at least you) actually meant was that the >> *hypervisor* calls the function named “handle_mmio” in the *app*? >> > Mmm... it's clearly me that am being dense, but what do you exactly > mean with "the hypervisor calls the function named handle_mmio() in the > app"? In particular the "in the app" part, and how is the hypervisor > going to be "in" the app... Well it sounds to me similar to what Linux would do with modules: the module has the symbols encoded somewhere in it. The hypervisor would load the "app" binary; and when the appropriate device MMIO happened, it would call the "handle_mmio()" function (which would be a bit more like an entry point). But it seems to me like having an interface where the app actively registers callbacks for specific events is a lot easier than working out how to store the dynamic linking information in the module and then parse it in Xen. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-01 10:52 ` George Dunlap 2017-06-01 10:54 ` George Dunlap 2017-06-01 12:40 ` Dario Faggioli @ 2017-06-01 18:27 ` Stefano Stabellini 2 siblings, 0 replies; 49+ messages in thread From: Stefano Stabellini @ 2017-06-01 18:27 UTC (permalink / raw) To: George Dunlap Cc: Artem_Mygaiev@epam.com, Stefano Stabellini, Andrii Anisov, Volodymyr Babchuk, Dario Faggioli, Julien Grall, xen-devel@lists.xensource.com [-- Attachment #1: Type: TEXT/PLAIN, Size: 4563 bytes --] On Thu, 1 Jun 2017, George Dunlap wrote: > > On May 31, 2017, at 6:45 PM, Stefano Stabellini <sstabellini@kernel.org> wrote: > > > > On Wed, 31 May 2017, George Dunlap wrote: > >> On 30/05/17 18:29, Stefano Stabellini wrote: > >>> On Fri, 26 May 2017, Volodymyr Babchuk wrote: > >>>>>>> The other issue with stubdoms is context switch times. Volodymyr showed > >>>>>>> that minios has much higher context switch times compared to EL0 apps. > >>>>>>> It is probably due to GIC context switch, that is skipped for EL0 apps. > >>>>>>> Maybe we could skip GIC context switch for stubdoms too, if we knew that > >>>>>>> they are not going to use the VGIC. At that point, context switch times > >>>>>>> should be very similar to EL0 apps. > >>>>>> So you are suggesting to create something like lightweight stubdom. I > >>>>>> generally like this idea. But AFAIK, vGIC is used to deliver events > >>>>>> from hypervisor to stubdom. Do you want to propose another mechanism? > >>>>> > >>>>> There is no way out: if the stubdom needs events, then we'll have to > >>>>> expose and context switch the vGIC. If it doesn't, then we can skip the > >>>>> vGIC. However, we would have a similar problem with EL0 apps: I am > >>>>> assuming that EL0 apps don't need to handle interrupts, but if they do, > >>>>> then they might need something like a vGIC. > >>>> Hm. Correct me, but if we want make stubdom to handle some requests > >>>> (e.g. emulate MMIO access), then it needs events, and thus it needs > >>>> interrupts. At least, I'm not aware about any other mechanism, that > >>>> allows hypervisor to signal to a domain. > >>> > >>> The stubdom could do polling and avoid interrupts for example, but that > >>> would probably not be desirable. > >>> > >>> > >>>> On other hand, EL0 app (as I see them) does not need such events. > >>>> Basically, you just call function `handle_mmio()` right in the app. > >>>> So, apps can live without interrupts and they still be able to handle > >>>> request. > >>> > >>> That's true. > >> > >> Well if they're in a separate security zone, that's not going to work. > >> You have to have a defined interface between things and sanitize inputs > >> between them. > > > > Why? The purpose of EL0 apps is not to do checks on VM traps in Xen but > > in a different privilege level instead. Maybe I misunderstood what you > > are saying? Specifically, what "inputs" do you think should be sanitized > > in Xen before jumping into the EL0 app? > > >> Furthermore, you probably want something like a stable > >> interface with some level of backwards compatibility, which is not > >> something the internal hypervisor interfaces are designed for. > > > > I don't think we should provide that. If the user wants a stable > > interface, she can use domains. I suggested that the code for the EL0 > > app should come out of the Xen repository directly. Like for the Xen > > tools, they would be expected to be always in-sync. > > Hmm, it sounds like perhaps I misunderstood you and Volodymyr. I took “you just call function `handle_mmio()` right in the app” to mean that the *app* calls the *hypervisor* function named “handle_mmio”. It sounds like what he (or at least you) actually meant was that the *hypervisor* calls the function named “handle_mmio” in the *app*? Indeed, I certainly understood Xen calls "handle_mmio" in an EL0 app. > But presumably the app will need to do privileged operations — change the guest’s state, read / write MMIO regions, &c. We can theoretically have Xen ‘just call functions’ in the app; but we definitely *cannot* have the app ‘just call functions’ inside of Xen — that is, not if you actually want any additional security. Absolutely. > And that’s completely apart from the whole non-GPL discussion we had. If you want non-GPL apps, I think you definitely want a nice clean interface, or you’ll have a hard time arguing that the resulting thing is not a derived work (in spite of the separate address spaces). That's right, I don't think EL0 apps are a good vehicle for non-GPL components. Stubdoms are better for that. > The two motivating factors for having apps were additional security and non-GPL implementations of device models / mediators. I think the two motivating factors are additional security and extremely low and deterministic latency. > Having the app being able to call into Xen undermines both. Indeed, but there needs to be a very small set of exposed calls, such as: - (un)mapping memory of a VM - inject interrupts into a VM [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-26 19:28 ` Volodymyr Babchuk 2017-05-30 17:29 ` Stefano Stabellini @ 2017-05-31 17:02 ` George Dunlap 2017-06-17 0:14 ` Volodymyr Babchuk 1 sibling, 1 reply; 49+ messages in thread From: George Dunlap @ 2017-05-31 17:02 UTC (permalink / raw) To: Volodymyr Babchuk, Stefano Stabellini Cc: Artem_Mygaiev, Dario Faggioli, xen-devel, Andrii Anisov, Julien Grall On 26/05/17 20:28, Volodymyr Babchuk wrote: >> There is no way out: if the stubdom needs events, then we'll have to >> expose and context switch the vGIC. If it doesn't, then we can skip the >> vGIC. However, we would have a similar problem with EL0 apps: I am >> assuming that EL0 apps don't need to handle interrupts, but if they do, >> then they might need something like a vGIC. > Hm. Correct me, but if we want make stubdom to handle some requests > (e.g. emulate MMIO access), then it needs events, and thus it needs > interrupts. At least, I'm not aware about any other mechanism, that > allows hypervisor to signal to a domain. > On other hand, EL0 app (as I see them) does not need such events. > Basically, you just call function `handle_mmio()` right in the app. > So, apps can live without interrupts and they still be able to handle > request. So remember that "interrupt" and "event" are basically the same as "structured callback". When anything happens that Xen wants to tell the EL0 app about, it has to have a way of telling it. If the EL0 app is handling a device, it has to have some way of getting interrupts from that device; if it needs to emulate devices sent to the guest, it needs some way to tell Xen to deliver an interrupt to the guest. Now, we could make the EL0 app interface "interruptless". Xen could write information about pending events in a shared memory region, and the EL0 app could check that before calling some sort of block() hypercall, and check it again when it returns from the block() call. But the shared event information starts to look an awful lot like events and/or pending bits on an interrupt controller -- the only difference being that you aren't interrupted if you're already running. I'm pretty sure you could run in this mode using the existing interfaces if you didn't want the hassle of dealing with asynchrony. If that's the case, then why bother inventing an entirely new interface, with its own bugs and duplication of functionality? Why not just use what we already have? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-31 17:02 ` George Dunlap @ 2017-06-17 0:14 ` Volodymyr Babchuk 2017-06-19 9:37 ` George Dunlap 0 siblings, 1 reply; 49+ messages in thread From: Volodymyr Babchuk @ 2017-06-17 0:14 UTC (permalink / raw) To: George Dunlap Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, Dario Faggioli, Julien Grall, xen-devel Hello George, On 31 May 2017 at 20:02, George Dunlap <george.dunlap@citrix.com> wrote: >>> There is no way out: if the stubdom needs events, then we'll have to >>> expose and context switch the vGIC. If it doesn't, then we can skip the >>> vGIC. However, we would have a similar problem with EL0 apps: I am >>> assuming that EL0 apps don't need to handle interrupts, but if they do, >>> then they might need something like a vGIC. >> Hm. Correct me, but if we want make stubdom to handle some requests >> (e.g. emulate MMIO access), then it needs events, and thus it needs >> interrupts. At least, I'm not aware about any other mechanism, that >> allows hypervisor to signal to a domain. >> On other hand, EL0 app (as I see them) does not need such events. >> Basically, you just call function `handle_mmio()` right in the app. >> So, apps can live without interrupts and they still be able to handle >> request. > > So remember that "interrupt" and "event" are basically the same as > "structured callback". When anything happens that Xen wants to tell the > EL0 app about, it has to have a way of telling it. If the EL0 app is > handling a device, it has to have some way of getting interrupts from > that device; if it needs to emulate devices sent to the guest, it needs > some way to tell Xen to deliver an interrupt to the guest. Basically yes. There should be mechanism to request something from native application. Question is how this mechanism can be implemented. Classical approach is a even-driven loop: while(1) { wait_for_event(); handle_event_event(); return_back_results(); } wait_for_event() can by anything from WFI instruction to read() on socket. This is how stubdoms are working. I agree with you: there are no sense to repeat this in native apps. > Now, we could make the EL0 app interface "interruptless". Xen could > write information about pending events in a shared memory region, and > the EL0 app could check that before calling some sort of block() > hypercall, and check it again when it returns from the block() call. > But the shared event information starts to look an awful lot like events > and/or pending bits on an interrupt controller -- the only difference > being that you aren't interrupted if you're already running. Actually there are third way, which I have used. I described it in original email (check out [1]). Basically, native application is dead until it is needed by hypervisor. When hypervisor wants some services from app, it setups parameters, switches mode to EL0 and jumps at app entry point. > I'm pretty sure you could run in this mode using the existing interfaces > if you didn't want the hassle of dealing with asynchrony. If that's the > case, then why bother inventing an entirely new interface, with its own > bugs and duplication of functionality? Why not just use what we already > have? Because we are concerned about latency. In my benchmark, my native app PoC is 1.6 times faster than stubdom. [1] http://marc.info/?l=xen-devel&m=149151018801649&w=2 -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-17 0:14 ` Volodymyr Babchuk @ 2017-06-19 9:37 ` George Dunlap 2017-06-19 17:54 ` Stefano Stabellini 2017-06-19 18:26 ` Volodymyr Babchuk 0 siblings, 2 replies; 49+ messages in thread From: George Dunlap @ 2017-06-19 9:37 UTC (permalink / raw) To: Volodymyr Babchuk Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, Dario Faggioli, Julien Grall, xen-devel On 17/06/17 01:14, Volodymyr Babchuk wrote: > Hello George, > > On 31 May 2017 at 20:02, George Dunlap <george.dunlap@citrix.com> wrote: >>>> There is no way out: if the stubdom needs events, then we'll have to >>>> expose and context switch the vGIC. If it doesn't, then we can skip the >>>> vGIC. However, we would have a similar problem with EL0 apps: I am >>>> assuming that EL0 apps don't need to handle interrupts, but if they do, >>>> then they might need something like a vGIC. >>> Hm. Correct me, but if we want make stubdom to handle some requests >>> (e.g. emulate MMIO access), then it needs events, and thus it needs >>> interrupts. At least, I'm not aware about any other mechanism, that >>> allows hypervisor to signal to a domain. >>> On other hand, EL0 app (as I see them) does not need such events. >>> Basically, you just call function `handle_mmio()` right in the app. >>> So, apps can live without interrupts and they still be able to handle >>> request. >> >> So remember that "interrupt" and "event" are basically the same as >> "structured callback". When anything happens that Xen wants to tell the >> EL0 app about, it has to have a way of telling it. If the EL0 app is >> handling a device, it has to have some way of getting interrupts from >> that device; if it needs to emulate devices sent to the guest, it needs >> some way to tell Xen to deliver an interrupt to the guest. > Basically yes. There should be mechanism to request something from > native application. Question is how this mechanism can be implemented. > Classical approach is a even-driven loop: > > while(1) { > wait_for_event(); > handle_event_event(); > return_back_results(); > } > > wait_for_event() can by anything from WFI instruction to read() on > socket. This is how stubdoms are working. I agree with you: there are > no sense to repeat this in native apps. > >> Now, we could make the EL0 app interface "interruptless". Xen could >> write information about pending events in a shared memory region, and >> the EL0 app could check that before calling some sort of block() >> hypercall, and check it again when it returns from the block() call. > >> But the shared event information starts to look an awful lot like events >> and/or pending bits on an interrupt controller -- the only difference >> being that you aren't interrupted if you're already running. > > Actually there are third way, which I have used. I described it in > original email (check out [1]). > Basically, native application is dead until it is needed by > hypervisor. When hypervisor wants some services from app, it setups > parameters, switches mode to EL0 and jumps at app entry point. What's the difference between "jumps to an app entry point" and "jumps to an interrupt handling routine"? And what's the difference between "Tells Xen about the location of the app entry point" and "tells Xen about the location of the interrupt handling routine"? If you want this "EL0 app" thing to be able to provide extra security over just running the code inside of Xen, then the code must not be able to DoS the host by spinning forever instead of returning. What happens if two different pcpus in Xen decide they want to activate some "app" functionality? >> I'm pretty sure you could run in this mode using the existing interfaces >> if you didn't want the hassle of dealing with asynchrony. If that's the >> case, then why bother inventing an entirely new interface, with its own >> bugs and duplication of functionality? Why not just use what we already >> have? > Because we are concerned about latency. In my benchmark, my native app > PoC is 1.6 times faster than stubdom. But given the conversation so far, it seems likely that that is mainly due to the fact that context switching on ARM has not been optimized. Just to be clear -- I'm not adamantly opposed to a new interface similar to what you're describing above. But I would be opposed to introducing a new interface that doesn't achieve the stated goals (more secure, &c), or a new interface that is the same as the old one but rewritten a bit. The point of having this design discussion up front is to prevent a situation where you spend months coding up something which is ultimately rejected. There are a lot of things that are hard to predict until there's actually code to review, but at the moment the "jumps to an interrupt handling routine" approach looks unpromising. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-19 9:37 ` George Dunlap @ 2017-06-19 17:54 ` Stefano Stabellini 2017-06-19 18:36 ` Volodymyr Babchuk 2017-06-19 18:26 ` Volodymyr Babchuk 1 sibling, 1 reply; 49+ messages in thread From: Stefano Stabellini @ 2017-06-19 17:54 UTC (permalink / raw) To: George Dunlap Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, Volodymyr Babchuk, Dario Faggioli, Julien Grall, xen-devel On Mon, 19 Jun 2017, George Dunlap wrote: > On 17/06/17 01:14, Volodymyr Babchuk wrote: > > Hello George, > > > > On 31 May 2017 at 20:02, George Dunlap <george.dunlap@citrix.com> wrote: > >>>> There is no way out: if the stubdom needs events, then we'll have to > >>>> expose and context switch the vGIC. If it doesn't, then we can skip the > >>>> vGIC. However, we would have a similar problem with EL0 apps: I am > >>>> assuming that EL0 apps don't need to handle interrupts, but if they do, > >>>> then they might need something like a vGIC. > >>> Hm. Correct me, but if we want make stubdom to handle some requests > >>> (e.g. emulate MMIO access), then it needs events, and thus it needs > >>> interrupts. At least, I'm not aware about any other mechanism, that > >>> allows hypervisor to signal to a domain. > >>> On other hand, EL0 app (as I see them) does not need such events. > >>> Basically, you just call function `handle_mmio()` right in the app. > >>> So, apps can live without interrupts and they still be able to handle > >>> request. > >> > >> So remember that "interrupt" and "event" are basically the same as > >> "structured callback". When anything happens that Xen wants to tell the > >> EL0 app about, it has to have a way of telling it. If the EL0 app is > >> handling a device, it has to have some way of getting interrupts from > >> that device; if it needs to emulate devices sent to the guest, it needs > >> some way to tell Xen to deliver an interrupt to the guest. > > Basically yes. There should be mechanism to request something from > > native application. Question is how this mechanism can be implemented. > > Classical approach is a even-driven loop: > > > > while(1) { > > wait_for_event(); > > handle_event_event(); > > return_back_results(); > > } > > > > wait_for_event() can by anything from WFI instruction to read() on > > socket. This is how stubdoms are working. I agree with you: there are > > no sense to repeat this in native apps. > > > >> Now, we could make the EL0 app interface "interruptless". Xen could > >> write information about pending events in a shared memory region, and > >> the EL0 app could check that before calling some sort of block() > >> hypercall, and check it again when it returns from the block() call. > > > >> But the shared event information starts to look an awful lot like events > >> and/or pending bits on an interrupt controller -- the only difference > >> being that you aren't interrupted if you're already running. > > > > Actually there are third way, which I have used. I described it in > > original email (check out [1]). > > Basically, native application is dead until it is needed by > > hypervisor. When hypervisor wants some services from app, it setups > > parameters, switches mode to EL0 and jumps at app entry point. > > What's the difference between "jumps to an app entry point" and "jumps > to an interrupt handling routine"? And what's the difference between > "Tells Xen about the location of the app entry point" and "tells Xen > about the location of the interrupt handling routine"? > > If you want this "EL0 app" thing to be able to provide extra security > over just running the code inside of Xen, then the code must not be able > to DoS the host by spinning forever instead of returning. I think that the "extra security" was mostly Julien's and my goal. Volodymyr would be OK with having the code in Xen, if I recall correctly from past conversations. In any case, wouldn't the usual Xen timer interrupt prevent this scenario from happening? > What happens if two different pcpus in Xen decide they want to activate > some "app" functionality? It should work fine as long as the app code is written to be able to cope with it (spin_locks, etc). > >> I'm pretty sure you could run in this mode using the existing interfaces > >> if you didn't want the hassle of dealing with asynchrony. If that's the > >> case, then why bother inventing an entirely new interface, with its own > >> bugs and duplication of functionality? Why not just use what we already > >> have? > > Because we are concerned about latency. In my benchmark, my native app > > PoC is 1.6 times faster than stubdom. > > But given the conversation so far, it seems likely that that is mainly > due to the fact that context switching on ARM has not been optimized. True. However, Volodymyr took the time to demonstrate the performance of EL0 apps vs. stubdoms with a PoC, which is much more than most Xen contributors do. Nodoby provided numbers for a faster ARM context switch yet. I don't know on whom should fall the burden of proving that a lighter context switch can match the EL0 app numbers. I am not sure it would be fair to ask Volodymyr to do it. > Just to be clear -- I'm not adamantly opposed to a new interface similar > to what you're describing above. But I would be opposed to introducing > a new interface that doesn't achieve the stated goals (more secure, &c), > or a new interface that is the same as the old one but rewritten a bit. > > The point of having this design discussion up front is to prevent a > situation where you spend months coding up something which is ultimately > rejected. There are a lot of things that are hard to predict until > there's actually code to review, but at the moment the "jumps to an > interrupt handling routine" approach looks unpromising. Did you mean "jumps to a app entry point" or "jumps to an interrupt handling routine"? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-19 17:54 ` Stefano Stabellini @ 2017-06-19 18:36 ` Volodymyr Babchuk 2017-06-20 10:11 ` Dario Faggioli 2017-06-20 10:45 ` Julien Grall 0 siblings, 2 replies; 49+ messages in thread From: Volodymyr Babchuk @ 2017-06-19 18:36 UTC (permalink / raw) To: Stefano Stabellini Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, Dario Faggioli, George Dunlap, Julien Grall Hi Stefano, On 19 June 2017 at 10:54, Stefano Stabellini <sstabellini@kernel.org> wrote: >> But given the conversation so far, it seems likely that that is mainly >> due to the fact that context switching on ARM has not been optimized. > > True. However, Volodymyr took the time to demonstrate the performance of > EL0 apps vs. stubdoms with a PoC, which is much more than most Xen > contributors do. Nodoby provided numbers for a faster ARM context switch > yet. I don't know on whom should fall the burden of proving that a > lighter context switch can match the EL0 app numbers. I am not sure it > would be fair to ask Volodymyr to do it. Thanks. Actually, we discussed this topic internally today. Main concern today is not a SMCs and OP-TEE (I will be happy to do this right in XEN), but vcopros and GPU virtualization. Because of legal issues, we can't put this in XEN. And because of vcpu framework nature we will need multiple calls to vgpu driver per one vcpu context switch. I'm going to create worst case scenario, where multiple vcpu are active and there are no free pcpu, to see how credit or credit2 scheduler will call my stubdom. Also, I'm very interested in Julien's idea about stubdom without GIC. Probably, I'll try to hack something like that to see how it will affect overall switching latency. -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-19 18:36 ` Volodymyr Babchuk @ 2017-06-20 10:11 ` Dario Faggioli 2017-07-07 15:02 ` Volodymyr Babchuk 2017-06-20 10:45 ` Julien Grall 1 sibling, 1 reply; 49+ messages in thread From: Dario Faggioli @ 2017-06-20 10:11 UTC (permalink / raw) To: Volodymyr Babchuk, Stefano Stabellini Cc: Artem_Mygaiev, Julien Grall, xen-devel, Andrii Anisov, George Dunlap [-- Attachment #1.1: Type: text/plain, Size: 1914 bytes --] On Mon, 2017-06-19 at 11:36 -0700, Volodymyr Babchuk wrote: > On 19 June 2017 at 10:54, Stefano Stabellini <sstabellini@kernel.org> > wrote: > > True. However, Volodymyr took the time to demonstrate the > > performance of > > EL0 apps vs. stubdoms with a PoC, which is much more than most Xen > > contributors do. Nodoby provided numbers for a faster ARM context > > switch > > yet. I don't know on whom should fall the burden of proving that a > > lighter context switch can match the EL0 app numbers. I am not sure > > it > > would be fair to ask Volodymyr to do it. > > Thanks. Actually, we discussed this topic internally today. Main > concern today is not a SMCs and OP-TEE (I will be happy to do this > right in XEN), but vcopros and GPU virtualization. Because of legal > issues, we can't put this in XEN. And because of vcpu framework > nature > we will need multiple calls to vgpu driver per one vcpu context > switch. > I'm going to create worst case scenario, where multiple vcpu are > active and there are no free pcpu, to see how credit or credit2 > scheduler will call my stubdom. > Well, that would be interesting and useful, thanks for offering doing that. Let's just keep in mind, though, that, if the numbers will turn out to be bad (and we manage to trace that back to being due to scheduling), then: 1) we can create a mechanism that bypasses the scheduler, 2) we can change the way stubdom are scheduled. Option 2) is something generic, would (most likely) benefit other use cases too, and we've said many times we'd be up for it... so let's please just not rule it out... :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-20 10:11 ` Dario Faggioli @ 2017-07-07 15:02 ` Volodymyr Babchuk 2017-07-07 16:41 ` Dario Faggioli 0 siblings, 1 reply; 49+ messages in thread From: Volodymyr Babchuk @ 2017-07-07 15:02 UTC (permalink / raw) To: Dario Faggioli Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, George Dunlap, Julien Grall, Stefano Stabellini Hello Dario, On 20 June 2017 at 13:11, Dario Faggioli <dario.faggioli@citrix.com> wrote: > On Mon, 2017-06-19 at 11:36 -0700, Volodymyr Babchuk wrote: >> On 19 June 2017 at 10:54, Stefano Stabellini <sstabellini@kernel.org> >> wrote: >> > True. However, Volodymyr took the time to demonstrate the >> > performance of >> > EL0 apps vs. stubdoms with a PoC, which is much more than most Xen >> > contributors do. Nodoby provided numbers for a faster ARM context >> > switch >> > yet. I don't know on whom should fall the burden of proving that a >> > lighter context switch can match the EL0 app numbers. I am not sure >> > it >> > would be fair to ask Volodymyr to do it. >> >> Thanks. Actually, we discussed this topic internally today. Main >> concern today is not a SMCs and OP-TEE (I will be happy to do this >> right in XEN), but vcopros and GPU virtualization. Because of legal >> issues, we can't put this in XEN. And because of vcpu framework >> nature >> we will need multiple calls to vgpu driver per one vcpu context >> switch. >> I'm going to create worst case scenario, where multiple vcpu are >> active and there are no free pcpu, to see how credit or credit2 >> scheduler will call my stubdom. >> > Well, that would be interesting and useful, thanks for offering doing > that. Yeah, so I did that. And I have get some puzzling results. I don't know why, but when I have 4 (or less) active vcpus on 4 pcpus, my test takes about 1 second to execute. But if there are 5 (or mode) active vcpus on 4 pcpus, it executes from 80 to 110 seconds. There will be the details, but first let me remind you my setup. I'm testing on ARM64 machine with 4 Cortex A57 cores. I wrote special test driver for linux, that calls SMC instruction 100 000 times. Also I hacked miniOS to act as monitor for DomU. This means that XEN traps SMC invocation and asks MiniOS to handle this. So, every SMC is handled in this way: DomU->XEN->MiniOS->XEN->DomU. Now, let's get back to results. ** Case 1: - Dom0 has 4 vcpus and is idle - DomU has 4 vcpus and is idle - Minios has 1 vcpu and is not idle, because it's scheduler does not calls WFI. I run test in DomU: root@salvator-x-h3-xt:~# time -p cat /proc/smc_bench Will call SMC 100000 time(s) Done! real 1.10 user 0.00 sys 1.10 ** Case 2: - Dom0 has 4 vcpus. They all are executing endless loop with sh oneliner: # while : ; do : ; done & - DomU has 4 vcpus and is idle - Minios has 1 vcpu and is not idle, because it's scheduler does not calls WFI. - In total there are 6 vcpus active I run test in DomU: real 113.08 user 0.00 sys 113.04 ** Case 3: - Dom0 has 4 vcpus. Three of them are executing endless loop with sh oneliner: # while : ; do : ; done & - DomU has 4 vcpus and is idle - Minios has 1 vcpu and is not idle, because it's scheduler does not calls WFI. - In total there are 5 vcpus active I run test in DomU: real 88.55 user 0.00 sys 88.54 ** Case 4: - Dom0 has 4 vcpus. Two of them are executing endless loop with sh oneliner: # while : ; do : ; done & - DomU has 4 vcpus and is idle - Minios has 1 vcpu and is not idle, because it's scheduler does not calls WFI. - In total there are 4 vcpus active I run test in DomU: real 1.11 user 0.00 sys 1.11 ** Case 5: - Dom0 has 4 vcpus and is idle. - DomU has 4 vcpus. Three of them are executing endless loop with sh oneliner: # while : ; do : ; done & - Minios have 1 vcpu and is not idle, because it's scheduler does not calls WFI. - In total there are 5 vcpus active I run test in DomU: real 100.96 user 0.00 sys 100.94 ** Case 6: - Dom0 has 4 vcpus and is idle. - DomU has 4 vcpus. Two of them are executing endless loop with sh oneliner: # while : ; do : ; done & - Minios have 1 vcpu and is not idle, because it's scheduler does not calls WFI. - In total there are 4 vcpus active I run test in DomU: real 1.11 user 0.00 sys 1.10 * Case 7 - Dom0 has 4 vcpus and is idle. - DomU has 4 vcpus. Two of them are executing endless loop with sh oneliner: # while : ; do : ; done & - Minios have 1 vcpu and is not idle, because it's scheduler does not calls WFI. - *Minios is running on separate cpu pool with 1 pcpu*: Name CPUs Sched Active Domain count Pool-0 3 credit y 2 minios 1 credit y 1 I run test in DomU: real 1.11 user 0.00 sys 1.10 * Case 8 - Dom0 has 4 vcpus and is idle. - DomU has 4 vcpus. Three of them are executing endless loop with sh oneliner: # while : ; do : ; done & - Minios have 1 vcpu and is not idle, because it's scheduler does not calls WFI. - Minios is running on separate cpu pool with 1 pcpu: I run test in DomU: real 100.12 user 0.00 sys 100.11 As you can see, I tried to move minios to separate cpu pool. But it didn't helped a lot. Name ID Mem VCPUs State Time(s) Cpupool Domain-0 0 752 4 r----- 1566.1 Pool-0 DomU 1 255 4 -b---- 4535.1 Pool-0 mini-os 2 128 1 r----- 2395.7 minios I expected that it would be 20% to 50% slower, when there are more vCPUs than pCPUs. But it is 100 times slower and I can't explain this. Probably, something is very broken in my XEN. But I used 4.9 with some hacks to make minios work. I didn't touched scheduler at all. -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-07-07 15:02 ` Volodymyr Babchuk @ 2017-07-07 16:41 ` Dario Faggioli 2017-07-07 17:03 ` Volodymyr Babchuk 0 siblings, 1 reply; 49+ messages in thread From: Dario Faggioli @ 2017-07-07 16:41 UTC (permalink / raw) To: Volodymyr Babchuk Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, George Dunlap, Julien Grall, Stefano Stabellini [-- Attachment #1.1: Type: text/plain, Size: 6061 bytes --] On Fri, 2017-07-07 at 18:02 +0300, Volodymyr Babchuk wrote: > Hello Dario, > Hi! > On 20 June 2017 at 13:11, Dario Faggioli <dario.faggioli@citrix.com> > wrote: > > On Mon, 2017-06-19 at 11:36 -0700, Volodymyr Babchuk wrote: > > > > > > Thanks. Actually, we discussed this topic internally today. Main > > > concern today is not a SMCs and OP-TEE (I will be happy to do > > > this > > > right in XEN), but vcopros and GPU virtualization. Because of > > > legal > > > issues, we can't put this in XEN. And because of vcpu framework > > > nature > > > we will need multiple calls to vgpu driver per one vcpu context > > > switch. > > > I'm going to create worst case scenario, where multiple vcpu are > > > active and there are no free pcpu, to see how credit or credit2 > > > scheduler will call my stubdom. > > > > > > > Well, that would be interesting and useful, thanks for offering > > doing > > that. > > Yeah, so I did that. > Ok, great! Thanks for doing and reporting about this. :-D > And I have get some puzzling results. I don't know why, > but when I have 4 (or less) active vcpus on 4 pcpus, my test takes > about 1 second to execute. > But if there are 5 (or mode) active vcpus on 4 pcpus, it executes > from > 80 to 110 seconds. > I see. So, I've got just a handful of minutes right now, to only quickly look at the result and ask a couple of questions. Will think about this more in the coming days... > There will be the details, but first let me remind you my setup. > I'm testing on ARM64 machine with 4 Cortex A57 cores. I wrote > special test driver for linux, that calls SMC instruction 100 000 > times. > Also I hacked miniOS to act as monitor for DomU. This means that > XEN traps SMC invocation and asks MiniOS to handle this. > Ok. > So, every SMC is handled in this way: > > DomU->XEN->MiniOS->XEN->DomU. > Right. Nice work again. > Now, let's get back to results. > > ** Case 1: > - Dom0 has 4 vcpus and is idle > - DomU has 4 vcpus and is idle > - Minios has 1 vcpu and is not idle, because it's scheduler does > not calls WFI. > I run test in DomU: > > root@salvator-x-h3-xt:~# time -p cat /proc/smc_bench > Will call SMC 100000 time(s) > So, given what you said above, this means that the vCPU that is running this will frequently block (when calling SMC) and resume (when SMC is handled) quite frequently, right? Also, are you sure (e.g., because of how the Linux driver is done) that this always happen on one vCPU? > Done! > real 1.10 > user 0.00 > sys 1.10 > ** Case 2: > - Dom0 has 4 vcpus. They all are executing endless loop with sh > oneliner: > # while : ; do : ; done & > - DomU has 4 vcpus and is idle > - Minios has 1 vcpu and is not idle, because it's scheduler does not > calls WFI. > Ah, I see. This is unideal IMO. It's fine for this POC, of course, but I guess you've got plans to change this (if we decide to go the stubdom route)? > - In total there are 6 vcpus active > > I run test in DomU: > real 113.08 > user 0.00 > sys 113.04 > Ok, so there's contention for pCPUs. Dom0's vCPUs are CPU hogs, while, if my assumption above is correct, the "SMC vCPU" of the DomU is I/O bound, in the sense that it blocks on an operation --which turns out to be SMC call to MiniOS-- then resumes and block again almost immediately. Since you are using Credit, can you try to disable context switch rate limiting? Something like: # xl sched-credit -s -r 0 should work. This looks to me like one of those typical scenario where rate limiting is counterproductive. In fact, every time that your SMC vCPU is woken up, despite being boosted, it finds all the pCPUs busy, and it can't preempt any of the vCPUs that are running there, until rate limiting expires. That means it has to wait an interval of time that varies between 0 and 1ms. This happens 100000 times, and 1ms*100000 is 100 seconds... Which is roughly how the test takes, in the overcommitted case. > * Case 7 > - Dom0 has 4 vcpus and is idle. > - DomU has 4 vcpus. Two of them are executing endless loop with sh > oneliner: > # while : ; do : ; done & > - Minios have 1 vcpu and is not idle, because it's scheduler does not > calls WFI. > - *Minios is running on separate cpu pool with 1 pcpu*: > Name CPUs Sched Active Domain count > Pool-0 3 credit y 2 > minios 1 credit y 1 > > I run test in DomU: > real 1.11 > user 0.00 > sys 1.10 > > * Case 8 > - Dom0 has 4 vcpus and is idle. > - DomU has 4 vcpus. Three of them are executing endless loop with sh > oneliner: > # while : ; do : ; done & > - Minios have 1 vcpu and is not idle, because it's scheduler does not > calls WFI. > - Minios is running on separate cpu pool with 1 pcpu: > > I run test in DomU: > real 100.12 > user 0.00 > sys 100.11 > > > As you can see, I tried to move minios to separate cpu pool. But it > didn't helped a lot. > Yes, but it again makes sense. In fact, now there are 3 CPUs in Pool-0, and all are kept always busy by the the 3 DomU vCPUs running endless loops. So, when the DomU's SMC vCPU wakes up, has again to wait for the rate limit to expire on one of them. > I expected that it would be 20% to 50% slower, when there are more > vCPUs than pCPUs. But it is 100 times slower and I can't explain > this. > Probably, something is very broken in my XEN. But I used 4.9 with > some > hacks to make minios work. I didn't touched scheduler at all. > If you can, try with rate limiting off and let me know. :-D Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-07-07 16:41 ` Dario Faggioli @ 2017-07-07 17:03 ` Volodymyr Babchuk 2017-07-07 21:12 ` Stefano Stabellini 2017-07-08 14:26 ` Dario Faggioli 0 siblings, 2 replies; 49+ messages in thread From: Volodymyr Babchuk @ 2017-07-07 17:03 UTC (permalink / raw) To: Dario Faggioli Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, George Dunlap, Julien Grall, Stefano Stabellini Hi again, On 7 July 2017 at 09:41, Dario Faggioli <dario.faggioli@citrix.com> wrote: > On Fri, 2017-07-07 at 18:02 +0300, Volodymyr Babchuk wrote: >> Hello Dario, >> > Hi! > >> On 20 June 2017 at 13:11, Dario Faggioli <dario.faggioli@citrix.com> >> wrote: >> > On Mon, 2017-06-19 at 11:36 -0700, Volodymyr Babchuk wrote: >> > > >> > > Thanks. Actually, we discussed this topic internally today. Main >> > > concern today is not a SMCs and OP-TEE (I will be happy to do >> > > this >> > > right in XEN), but vcopros and GPU virtualization. Because of >> > > legal >> > > issues, we can't put this in XEN. And because of vcpu framework >> > > nature >> > > we will need multiple calls to vgpu driver per one vcpu context >> > > switch. >> > > I'm going to create worst case scenario, where multiple vcpu are >> > > active and there are no free pcpu, to see how credit or credit2 >> > > scheduler will call my stubdom. >> > > >> > >> > Well, that would be interesting and useful, thanks for offering >> > doing >> > that. >> >> Yeah, so I did that. >> > Ok, great! Thanks for doing and reporting about this. :-D > >> And I have get some puzzling results. I don't know why, >> but when I have 4 (or less) active vcpus on 4 pcpus, my test takes >> about 1 second to execute. >> But if there are 5 (or mode) active vcpus on 4 pcpus, it executes >> from >> 80 to 110 seconds. >> > I see. So, I've got just a handful of minutes right now, to only > quickly look at the result and ask a couple of questions. Will think > about this more in the coming days... > >> There will be the details, but first let me remind you my setup. >> I'm testing on ARM64 machine with 4 Cortex A57 cores. I wrote >> special test driver for linux, that calls SMC instruction 100 000 >> times. >> Also I hacked miniOS to act as monitor for DomU. This means that >> XEN traps SMC invocation and asks MiniOS to handle this. >> > Ok. > >> So, every SMC is handled in this way: >> >> DomU->XEN->MiniOS->XEN->DomU. >> > Right. Nice work again. > >> Now, let's get back to results. >> >> ** Case 1: >> - Dom0 has 4 vcpus and is idle >> - DomU has 4 vcpus and is idle >> - Minios has 1 vcpu and is not idle, because it's scheduler does >> not calls WFI. >> I run test in DomU: >> >> root@salvator-x-h3-xt:~# time -p cat /proc/smc_bench >> Will call SMC 100000 time(s) >> > So, given what you said above, this means that the vCPU that is running > this will frequently block (when calling SMC) and resume (when SMC is > handled) quite frequently, right? Yes, exactly. There is vm_event_vcpu_pause(v) call in monitor.c > > Also, are you sure (e.g., because of how the Linux driver is done) that > this always happen on one vCPU? No, I can't guarantee that. Linux driver is single threaded, but I did nothing to pin in to a certain CPU. > >> Done! >> real 1.10 >> user 0.00 >> sys 1.10 > >> ** Case 2: >> - Dom0 has 4 vcpus. They all are executing endless loop with sh >> oneliner: >> # while : ; do : ; done & >> - DomU has 4 vcpus and is idle >> - Minios has 1 vcpu and is not idle, because it's scheduler does not >> calls WFI. >> > Ah, I see. This is unideal IMO. It's fine for this POC, of course, but > I guess you've got plans to change this (if we decide to go the stubdom > route)? Sure. There are much to be done in MiniOS to make it production-grade. > >> - In total there are 6 vcpus active >> >> I run test in DomU: >> real 113.08 >> user 0.00 >> sys 113.04 >> > Ok, so there's contention for pCPUs. Dom0's vCPUs are CPU hogs, while, > if my assumption above is correct, the "SMC vCPU" of the DomU is I/O > bound, in the sense that it blocks on an operation --which turns out to > be SMC call to MiniOS-- then resumes and block again almost > immediately. > > Since you are using Credit, can you try to disable context switch rate > limiting? Something like: > > # xl sched-credit -s -r 0 > > should work. Yep. You are right. In the environment described above (Case 2) I now get much better results: real 1.85 user 0.00 sys 1.85 > This looks to me like one of those typical scenario where rate limiting > is counterproductive. In fact, every time that your SMC vCPU is woken > up, despite being boosted, it finds all the pCPUs busy, and it can't > preempt any of the vCPUs that are running there, until rate limiting > expires. > > That means it has to wait an interval of time that varies between 0 and > 1ms. This happens 100000 times, and 1ms*100000 is 100 seconds... Which > is roughly how the test takes, in the overcommitted case. Yes, looks like that was the case. Does this means that ratelimiting should be disabled for any domain that is backed up with device model? AFAIK, device models are working in the exactly same way. >> * Case 7 >> - Dom0 has 4 vcpus and is idle. >> - DomU has 4 vcpus. Two of them are executing endless loop with sh >> oneliner: >> # while : ; do : ; done & >> - Minios have 1 vcpu and is not idle, because it's scheduler does not >> calls WFI. >> - *Minios is running on separate cpu pool with 1 pcpu*: >> Name CPUs Sched Active Domain count >> Pool-0 3 credit y 2 >> minios 1 credit y 1 >> >> I run test in DomU: >> real 1.11 >> user 0.00 >> sys 1.10 >> >> * Case 8 >> - Dom0 has 4 vcpus and is idle. >> - DomU has 4 vcpus. Three of them are executing endless loop with sh >> oneliner: >> # while : ; do : ; done & >> - Minios have 1 vcpu and is not idle, because it's scheduler does not >> calls WFI. >> - Minios is running on separate cpu pool with 1 pcpu: >> >> I run test in DomU: >> real 100.12 >> user 0.00 >> sys 100.11 >> >> >> As you can see, I tried to move minios to separate cpu pool. But it >> didn't helped a lot. >> > Yes, but it again makes sense. In fact, now there are 3 CPUs in Pool-0, > and all are kept always busy by the the 3 DomU vCPUs running endless > loops. So, when the DomU's SMC vCPU wakes up, has again to wait for the > rate limit to expire on one of them. Yes, as this was caused by ratelimit, this makes perfect sense. Thank you. I tried number of different cases. Now execution time depends linearly on number of over-committed vCPUs (about +200ms for every busy vCPU). That is what I'm expected. -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-07-07 17:03 ` Volodymyr Babchuk @ 2017-07-07 21:12 ` Stefano Stabellini 2017-07-12 6:14 ` Dario Faggioli 2017-07-08 14:26 ` Dario Faggioli 1 sibling, 1 reply; 49+ messages in thread From: Stefano Stabellini @ 2017-07-07 21:12 UTC (permalink / raw) To: Volodymyr Babchuk Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, Dario Faggioli, George Dunlap, Julien Grall, Stefano Stabellini On Fri, 7 Jul 2017, Volodymyr Babchuk wrote: > >> I run test in DomU: > >> real 113.08 > >> user 0.00 > >> sys 113.04 > >> > > Ok, so there's contention for pCPUs. Dom0's vCPUs are CPU hogs, while, > > if my assumption above is correct, the "SMC vCPU" of the DomU is I/O > > bound, in the sense that it blocks on an operation --which turns out to > > be SMC call to MiniOS-- then resumes and block again almost > > immediately. > > > > Since you are using Credit, can you try to disable context switch rate > > limiting? Something like: > > > > # xl sched-credit -s -r 0 > > > > should work. > Yep. You are right. In the environment described above (Case 2) I now > get much better results: > > real 1.85 > user 0.00 > sys 1.85 From 113 to 1.85 -- WOW! Obviously I am no scheduler expert, but shouldn't we advertise a bit better a scheduler configuration option that makes things _one hundred times faster_ ?! It's not even mentioned in https://wiki.xen.org/wiki/Tuning_Xen_for_Performance! Also, it is worrying to me that there are cases were, unless the user tweaks the configuration, she is going to get 100x worse performance out of her system. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-07-07 21:12 ` Stefano Stabellini @ 2017-07-12 6:14 ` Dario Faggioli 2017-07-17 9:25 ` George Dunlap 0 siblings, 1 reply; 49+ messages in thread From: Dario Faggioli @ 2017-07-12 6:14 UTC (permalink / raw) To: Stefano Stabellini, Volodymyr Babchuk Cc: Artem_Mygaiev, Julien Grall, xen-devel, Andrii Anisov, George Dunlap [-- Attachment #1.1: Type: text/plain, Size: 2207 bytes --] On Fri, 2017-07-07 at 14:12 -0700, Stefano Stabellini wrote: > On Fri, 7 Jul 2017, Volodymyr Babchuk wrote: > > > > > > > Since you are using Credit, can you try to disable context switch > > > rate > > > limiting? > > > > Yep. You are right. In the environment described above (Case 2) I > > now > > get much better results: > > > > real 1.85 > > user 0.00 > > sys 1.85 > > From 113 to 1.85 -- WOW! > > Obviously I am no scheduler expert, but shouldn't we advertise a bit > better a scheduler configuration option that makes things _one > hundred > times faster_ ?! > So, to be fair, so far, we've bitten this hard by this only on artificially constructed test cases, where either some extreme assumption were made (e.g., that all the vCPUs except one always run at 100% load) or pinning was used in a weird and suboptimal way. And there are workload where it has been verified that it helps making performance better (poor SpecVIRT results without it was the main motivation having it upstream, and on by default). That being said, I personally have never liked rate-limiting, it always looked to me like the wrong solution. > It's not even mentioned in > https://wiki.xen.org/wiki/Tuning_Xen_for_Performance! > Well, for sure it should be mentioned here, you're right! > Also, it is worrying to me that there are cases were, unless the user > tweaks the configuration, she is going to get 100x worse performance > out > of her system. > As I said, it's hard to tell in advance whether it will have a good, bad, or really bad impact on a specific workload. I'm starting to think, though, that it may be good to switch to having it off by default, and then document that if the system is going into trashing because of too frequent context switches, turning it on may help. I'll think about it, and see if I'll be able to run some benchmarks with it on and off. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-07-12 6:14 ` Dario Faggioli @ 2017-07-17 9:25 ` George Dunlap 2017-07-17 10:04 ` Julien Grall 2017-07-20 8:49 ` Dario Faggioli 0 siblings, 2 replies; 49+ messages in thread From: George Dunlap @ 2017-07-17 9:25 UTC (permalink / raw) To: Dario Faggioli, Stefano Stabellini, Volodymyr Babchuk Cc: Artem_Mygaiev, Julien Grall, xen-devel, Andrii Anisov On 07/12/2017 07:14 AM, Dario Faggioli wrote: > On Fri, 2017-07-07 at 14:12 -0700, Stefano Stabellini wrote: >> On Fri, 7 Jul 2017, Volodymyr Babchuk wrote: >>>>> >>>> Since you are using Credit, can you try to disable context switch >>>> rate >>>> limiting? >>> >>> Yep. You are right. In the environment described above (Case 2) I >>> now >>> get much better results: >>> >>> real 1.85 >>> user 0.00 >>> sys 1.85 >> >> From 113 to 1.85 -- WOW! >> >> Obviously I am no scheduler expert, but shouldn't we advertise a bit >> better a scheduler configuration option that makes things _one >> hundred >> times faster_ ?! >> > So, to be fair, so far, we've bitten this hard by this only on > artificially constructed test cases, where either some extreme > assumption were made (e.g., that all the vCPUs except one always run at > 100% load) or pinning was used in a weird and suboptimal way. And there > are workload where it has been verified that it helps making > performance better (poor SpecVIRT results without it was the main > motivation having it upstream, and on by default). > > That being said, I personally have never liked rate-limiting, it always > looked to me like the wrong solution. In fact, I *think* the only reason it may have been introduced is that there was a bug in the credit2 code at the time such that it always had a single runqueue no matter what your actual pcpu topology was. >> It's not even mentioned in >> https://wiki.xen.org/wiki/Tuning_Xen_for_Performance! >> > Well, for sure it should be mentioned here, you're right! > >> Also, it is worrying to me that there are cases were, unless the user >> tweaks the configuration, she is going to get 100x worse performance >> out >> of her system. >> > As I said, it's hard to tell in advance whether it will have a good, > bad, or really bad impact on a specific workload. > > I'm starting to think, though, that it may be good to switch to having > it off by default, and then document that if the system is going into > trashing because of too frequent context switches, turning it on may > help. > > I'll think about it, and see if I'll be able to run some benchmarks > with it on and off. Thanks. FYI the main benchmark that was used to justify its inclusion (and on by default) was specvirt (I think). -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-07-17 9:25 ` George Dunlap @ 2017-07-17 10:04 ` Julien Grall 2017-07-17 11:28 ` George Dunlap 2017-07-20 8:49 ` Dario Faggioli 1 sibling, 1 reply; 49+ messages in thread From: Julien Grall @ 2017-07-17 10:04 UTC (permalink / raw) To: George Dunlap, Dario Faggioli, Stefano Stabellini, Volodymyr Babchuk Cc: Artem_Mygaiev, xen-devel, Andrii Anisov Hi, On 17/07/17 10:25, George Dunlap wrote: > On 07/12/2017 07:14 AM, Dario Faggioli wrote: >> On Fri, 2017-07-07 at 14:12 -0700, Stefano Stabellini wrote: >>> On Fri, 7 Jul 2017, Volodymyr Babchuk wrote: >>>>>> >>>>> Since you are using Credit, can you try to disable context switch >>>>> rate >>>>> limiting? >>>> >>>> Yep. You are right. In the environment described above (Case 2) I >>>> now >>>> get much better results: >>>> >>>> real 1.85 >>>> user 0.00 >>>> sys 1.85 >>> >>> From 113 to 1.85 -- WOW! >>> >>> Obviously I am no scheduler expert, but shouldn't we advertise a bit >>> better a scheduler configuration option that makes things _one >>> hundred >>> times faster_ ?! >>> >> So, to be fair, so far, we've bitten this hard by this only on >> artificially constructed test cases, where either some extreme >> assumption were made (e.g., that all the vCPUs except one always run at >> 100% load) or pinning was used in a weird and suboptimal way. And there >> are workload where it has been verified that it helps making >> performance better (poor SpecVIRT results without it was the main >> motivation having it upstream, and on by default). >> >> That being said, I personally have never liked rate-limiting, it always >> looked to me like the wrong solution. > > In fact, I *think* the only reason it may have been introduced is that > there was a bug in the credit2 code at the time such that it always had > a single runqueue no matter what your actual pcpu topology was. FWIW, we don't yet parse the pCPU topology on ARM. AFAIU, we always tell Xen each CPU is in its own core. Will it have some implications in the scheduler? Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-07-17 10:04 ` Julien Grall @ 2017-07-17 11:28 ` George Dunlap 2017-07-19 11:21 ` Julien Grall 2017-07-20 9:10 ` Dario Faggioli 0 siblings, 2 replies; 49+ messages in thread From: George Dunlap @ 2017-07-17 11:28 UTC (permalink / raw) To: Julien Grall, Dario Faggioli, Stefano Stabellini, Volodymyr Babchuk Cc: Artem_Mygaiev, xen-devel, Andrii Anisov On 07/17/2017 11:04 AM, Julien Grall wrote: > Hi, > > On 17/07/17 10:25, George Dunlap wrote: >> On 07/12/2017 07:14 AM, Dario Faggioli wrote: >>> On Fri, 2017-07-07 at 14:12 -0700, Stefano Stabellini wrote: >>>> On Fri, 7 Jul 2017, Volodymyr Babchuk wrote: >>>>>>> >>>>>> Since you are using Credit, can you try to disable context switch >>>>>> rate >>>>>> limiting? >>>>> >>>>> Yep. You are right. In the environment described above (Case 2) I >>>>> now >>>>> get much better results: >>>>> >>>>> real 1.85 >>>>> user 0.00 >>>>> sys 1.85 >>>> >>>> From 113 to 1.85 -- WOW! >>>> >>>> Obviously I am no scheduler expert, but shouldn't we advertise a bit >>>> better a scheduler configuration option that makes things _one >>>> hundred >>>> times faster_ ?! >>>> >>> So, to be fair, so far, we've bitten this hard by this only on >>> artificially constructed test cases, where either some extreme >>> assumption were made (e.g., that all the vCPUs except one always run at >>> 100% load) or pinning was used in a weird and suboptimal way. And there >>> are workload where it has been verified that it helps making >>> performance better (poor SpecVIRT results without it was the main >>> motivation having it upstream, and on by default). >>> >>> That being said, I personally have never liked rate-limiting, it always >>> looked to me like the wrong solution. >> >> In fact, I *think* the only reason it may have been introduced is that >> there was a bug in the credit2 code at the time such that it always had >> a single runqueue no matter what your actual pcpu topology was. > > FWIW, we don't yet parse the pCPU topology on ARM. AFAIU, we always tell > Xen each CPU is in its own core. Will it have some implications in the > scheduler? Just checking -- you do mean its own core, as opposed to its own socket? (Or NUMA node?) On any system without hyperthreading (or with HT disabled), that's what an x86 system will see as well. Most schedulers have one runqueue per logical cpu. Credit2 has the option of having one runqueue per logical cpu, one per core (i.e., hyperthreads share a runqueue), one runqueue per socket (i.e., all cores on the same socket share a runqueue), or one socket across the whole system. I *think* we made one socket per core the default a while back to deal with multithreading, but I may not be remembering correctly. In any case, if you don't have threads, then reporting each logical cpu as its own core is the right thing to do. If you're mis-reporting sockets, then the scheduler will be unable to take that into account. But that's not usually going to be a major issue, mainly because the scheduler is not actually in a position to determine, most of the time, which is the optimal configuration. If two vcpus are communicating a lot, then the optimal configuration is to put them on different cores of the same socket (so they can share an L3 cache); if two vcpus are computing independently, then the optimal configuration is to put them on different sockets, so they can each have their own L3 cache. Xen isn't in a position to know which one is more important, so it just assumes each vcpu is independent. All that to say: It shouldn't be a major issue if you are mis-reporting sockets. :-) -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-07-17 11:28 ` George Dunlap @ 2017-07-19 11:21 ` Julien Grall 2017-07-20 9:25 ` Dario Faggioli 2017-07-20 9:10 ` Dario Faggioli 1 sibling, 1 reply; 49+ messages in thread From: Julien Grall @ 2017-07-19 11:21 UTC (permalink / raw) To: George Dunlap, Dario Faggioli, Stefano Stabellini, Volodymyr Babchuk Cc: Artem_Mygaiev, xen-devel, Andrii Anisov Hi George, On 17/07/17 12:28, George Dunlap wrote: > On 07/17/2017 11:04 AM, Julien Grall wrote: >> Hi, >> >> On 17/07/17 10:25, George Dunlap wrote: >>> On 07/12/2017 07:14 AM, Dario Faggioli wrote: >>>> On Fri, 2017-07-07 at 14:12 -0700, Stefano Stabellini wrote: >>>>> On Fri, 7 Jul 2017, Volodymyr Babchuk wrote: >>>>>>>> >>>>>>> Since you are using Credit, can you try to disable context switch >>>>>>> rate >>>>>>> limiting? >>>>>> >>>>>> Yep. You are right. In the environment described above (Case 2) I >>>>>> now >>>>>> get much better results: >>>>>> >>>>>> real 1.85 >>>>>> user 0.00 >>>>>> sys 1.85 >>>>> >>>>> From 113 to 1.85 -- WOW! >>>>> >>>>> Obviously I am no scheduler expert, but shouldn't we advertise a bit >>>>> better a scheduler configuration option that makes things _one >>>>> hundred >>>>> times faster_ ?! >>>>> >>>> So, to be fair, so far, we've bitten this hard by this only on >>>> artificially constructed test cases, where either some extreme >>>> assumption were made (e.g., that all the vCPUs except one always run at >>>> 100% load) or pinning was used in a weird and suboptimal way. And there >>>> are workload where it has been verified that it helps making >>>> performance better (poor SpecVIRT results without it was the main >>>> motivation having it upstream, and on by default). >>>> >>>> That being said, I personally have never liked rate-limiting, it always >>>> looked to me like the wrong solution. >>> >>> In fact, I *think* the only reason it may have been introduced is that >>> there was a bug in the credit2 code at the time such that it always had >>> a single runqueue no matter what your actual pcpu topology was. >> >> FWIW, we don't yet parse the pCPU topology on ARM. AFAIU, we always tell >> Xen each CPU is in its own core. Will it have some implications in the >> scheduler? > > Just checking -- you do mean its own core, as opposed to its own socket? > (Or NUMA node?) I don't know much about the scheduler, so I might say something stupid here :). Below the code we have for ARM /* XXX these seem awfully x86ish... */ /* representing HT siblings of each logical CPU */ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_mask); /* representing HT and core siblings of each logical CPU */ DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask); static void setup_cpu_sibling_map(int cpu) { if ( !zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) || !zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) ) panic("No memory for CPU sibling/core maps"); /* A CPU is a sibling with itself and is always on its own core. */ cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu)); cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu)); } #define cpu_to_socket(_cpu) (0) After calling setup_cpu_sibling_map, we never touch cpu_sibling_mask and cpu_core_mask for a given pCPU. So I would say that each logical CPU is in its own core, but they are all in the same socket at the moment. > > On any system without hyperthreading (or with HT disabled), that's what > an x86 system will see as well. > > Most schedulers have one runqueue per logical cpu. Credit2 has the > option of having one runqueue per logical cpu, one per core (i.e., > hyperthreads share a runqueue), one runqueue per socket (i.e., all cores > on the same socket share a runqueue), or one socket across the whole > system. I *think* we made one socket per core the default a while back > to deal with multithreading, but I may not be remembering correctly. > > In any case, if you don't have threads, then reporting each logical cpu > as its own core is the right thing to do. The architecture doesn't disallow to do HT on ARM. Though, I am not aware of any cores using it today. > > If you're mis-reporting sockets, then the scheduler will be unable to > take that into account. But that's not usually going to be a major > issue, mainly because the scheduler is not actually in a position to > determine, most of the time, which is the optimal configuration. If two > vcpus are communicating a lot, then the optimal configuration is to put > them on different cores of the same socket (so they can share an L3 > cache); if two vcpus are computing independently, then the optimal > configuration is to put them on different sockets, so they can each have > their own L3 cache. Xen isn't in a position to know which one is more > important, so it just assumes each vcpu is independent. > > All that to say: It shouldn't be a major issue if you are mis-reporting > sockets. :-) Good to know, thank you for the explanation! We might want to parse the bindings correctly to get a bit of improvement. I will add a task on jira. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-07-19 11:21 ` Julien Grall @ 2017-07-20 9:25 ` Dario Faggioli 0 siblings, 0 replies; 49+ messages in thread From: Dario Faggioli @ 2017-07-20 9:25 UTC (permalink / raw) To: Julien Grall, George Dunlap, Stefano Stabellini, Volodymyr Babchuk Cc: Artem_Mygaiev, xen-devel, Andrii Anisov [-- Attachment #1.1: Type: text/plain, Size: 3026 bytes --] On Wed, 2017-07-19 at 12:21 +0100, Julien Grall wrote: > On 17/07/17 12:28, George Dunlap wrote: > > Just checking -- you do mean its own core, as opposed to its own > > socket? > > (Or NUMA node?) > > I don't know much about the scheduler, so I might say something > stupid > here :). Below the code we have for ARM > > /* XXX these seem awfully x86ish... */ > /* representing HT siblings of each logical CPU */ > DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_mask); > /* representing HT and core siblings of each logical CPU */ > DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_mask); > > static void setup_cpu_sibling_map(int cpu) > { > if ( !zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) || > !zalloc_cpumask_var(&per_cpu(cpu_core_mask, cpu)) ) > panic("No memory for CPU sibling/core maps"); > > /* A CPU is a sibling with itself and is always on its own core. > */ > cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu)); > cpumask_set_cpu(cpu, per_cpu(cpu_core_mask, cpu)); > } > > #define cpu_to_socket(_cpu) (0) > > After calling setup_cpu_sibling_map, we never touch cpu_sibling_mask > and > cpu_core_mask for a given pCPU. So I would say that each logical CPU > is > in its own core, but they are all in the same socket at the moment. > Ah, fine... so you're in the exact opposite situation I was thinking about and reasoning upon in the reply to George I've just sent! :-P Ok, this basically means that, by default, in any ARM system, no matter how big or small, Credit2 will always use just one runqueue, from which _all_ the pCPUs will fish vCPUs, for running them. As said already, it's impossible to tell whether this is either bad or good, in the general case. It's good for fairness and load distribution (load balancing happens automatically, without the actual load balancing logic and code having to do anything at all!), but it's bad for lock contention (every runq operation, e.g., wakeup, schedule, etc., have to take the same lock). I think this explains at least part of why Stefano's wakeup latency numbers are rather bad with Credit2, on ARM, but that is not the case for my tests on x86. > > All that to say: It shouldn't be a major issue if you are mis- > > reporting > > sockets. :-) > > Good to know, thank you for the explanation! We might want to parse > the > bindings correctly to get a bit of improvement. I will add a task on > jira. > Yes, we should. Credit1 does not care about, but Credit2 is specifically designed to take advantage of these (and possibly even more!) information, so they need to be accurate. :-D Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-07-17 11:28 ` George Dunlap 2017-07-19 11:21 ` Julien Grall @ 2017-07-20 9:10 ` Dario Faggioli 1 sibling, 0 replies; 49+ messages in thread From: Dario Faggioli @ 2017-07-20 9:10 UTC (permalink / raw) To: George Dunlap, Julien Grall, Stefano Stabellini, Volodymyr Babchuk Cc: Artem_Mygaiev, xen-devel, Andrii Anisov [-- Attachment #1.1: Type: text/plain, Size: 3354 bytes --] On Mon, 2017-07-17 at 12:28 +0100, George Dunlap wrote: > Most schedulers have one runqueue per logical cpu. Credit2 has the > option of having one runqueue per logical cpu, one per core (i.e., > hyperthreads share a runqueue), one runqueue per socket (i.e., all > cores > on the same socket share a runqueue), or one socket across the whole > system. > You mean "or one runqueue across the whole system", I guess? :-) > I *think* we made one socket per core the default a while back > to deal with multithreading, but I may not be remembering correctly. > We've have per-core runqueue as default, to deal with hyperthreading for some time. Nowadays, handling hyperthreading is done independently by runqueue arrangement, and so the current default is one runqueue per-socket. > In any case, if you don't have threads, then reporting each logical > cpu as its own core is the right thing to do. > Yep. > If you're mis-reporting sockets, then the scheduler will be unable to > take that into account. > And if this means that each logical CPU is also reported as being its own socket, then you have one runqueue per logical CPU. > But that's not usually going to be a major > issue, mainly because the scheduler is not actually in a position to > determine, most of the time, which is the optimal configuration. If > two > vcpus are communicating a lot, then the optimal configuration is to > put > them on different cores of the same socket (so they can share an L3 > cache); if two vcpus are computing independently, then the optimal > configuration is to put them on different sockets, so they can each > have > their own L3 cache. > This is all very true. However, if two CPUs share one runqueue, vCPUs will seamlessly move between the two CPUs, without having to wait for the load balancing logic to kick in. This is a rather cheap way of achieving good fairness and load balancing, but is only effective if this movement is also cheap, which, e.g., is probably the case if the CPUs share some level of cache. So, figuring out what the best runqueue arrangement is, is rather hard to do automatically, as it depends both on the workload and on the hardware characteristics of the platform, but having at last some degree of runqueue sharing, among the CPUs that have some cache levels in common, would be, IMO, our best bet. And we do need topology information to try to do that. (We would also need, in Credit2 code, to take more into account cache and memory hierarchy information, rather than "just" CPU topology. We're already working, for instance, of changing CSCHED2_MIGRATE_RESIST from being constant, to vary depending on the amount of cache-sharing between two CPUs.) > All that to say: It shouldn't be a major issue if you are mis- > reporting > sockets. :-) > Maybe yes, maybe not. It may actually be even better on some combination of platforms and workloads, indeed... but it also means that the Credit2 load balancer is being invoked a lot, which may be unideal. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-07-17 9:25 ` George Dunlap 2017-07-17 10:04 ` Julien Grall @ 2017-07-20 8:49 ` Dario Faggioli 1 sibling, 0 replies; 49+ messages in thread From: Dario Faggioli @ 2017-07-20 8:49 UTC (permalink / raw) To: George Dunlap, Stefano Stabellini, Volodymyr Babchuk Cc: Artem_Mygaiev, Julien Grall, xen-devel, Andrii Anisov [-- Attachment #1.1: Type: text/plain, Size: 3006 bytes --] On Mon, 2017-07-17 at 10:25 +0100, George Dunlap wrote: > On 07/12/2017 07:14 AM, Dario Faggioli wrote: > > > > That being said, I personally have never liked rate-limiting, it > > always > > looked to me like the wrong solution. > > In fact, I *think* the only reason it may have been introduced is > that > there was a bug in the credit2 code at the time such that it always > had > a single runqueue no matter what your actual pcpu topology was. > It has been introduced because SpecVirt perf were bad because, during interrupt storms, the context-switch rate was really really high. It was all about Credit1... Work on Credit2 was stalled at the time, and there has been, AFAICR, no evaluation of Credit2 was involved: https://wiki.xen.org/wiki/Credit_Scheduler#Context-Switch_Rate_Limiting https://lists.xenproject.org/archives/html/xen-devel/2011-12/msg00897.html%7C (And in fact, it was not implemented in Credit2, until something like last year, Anshul wrote the code for that.) SpecVirt performance were judged to be important enough (e.g., because we've been told people was using that for comparing us with other virt. solutions), that this was set to on by default. I don't know if that is still the case, as I've run many benchmarks, but never had the chance to try SpecVirt first hand myself. Fact is that Credit1 does not have any measure in place for limit/control context-switch rate, and it has boosting, which means that rate- limiting (as much as I may hate it :-P) is actually useful. Whether we should have it disabled by default, and tell people (in documentation) to enable it if they think they're seeing the system going into trashing because of context switching, or the vice-versa, it's one of those things which is rather hard to tell. Let's see... In Credit2, we do have CSCHED2_MIN_TIMER (which is not equivalent to ratelimiting, of course, but it at least is something that goes in the direction of trying to avoid too frequent interruptions), and (much more important, IMO) we don't have boosting... So, I think it would be interesting to try figuring out the role that rate-limiting plays, when Credit2 is in use (and then, maybe, if we find that there are differences, find a way to have, as default, it enabled on Credit1 and disabled on Credit2). > > I'll think about it, and see if I'll be able to run some benchmarks > > with it on and off. > > Thanks. FYI the main benchmark that was used to justify its > inclusion > (and on by default) was specvirt (I think). > Yeah, I know. I'm not sure I will have the chance to run that soon, though. I'll try a bunch of other workloads, and we'll see what I will find. :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-07-07 17:03 ` Volodymyr Babchuk 2017-07-07 21:12 ` Stefano Stabellini @ 2017-07-08 14:26 ` Dario Faggioli 1 sibling, 0 replies; 49+ messages in thread From: Dario Faggioli @ 2017-07-08 14:26 UTC (permalink / raw) To: Volodymyr Babchuk Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, George Dunlap, Julien Grall, Stefano Stabellini [-- Attachment #1.1: Type: text/plain, Size: 4005 bytes --] On Fri, 2017-07-07 at 10:03 -0700, Volodymyr Babchuk wrote: > On 7 July 2017 at 09:41, Dario Faggioli <dario.faggioli@citrix.com> > wrote: > > > > Also, are you sure (e.g., because of how the Linux driver is done) > > that > > this always happen on one vCPU? > > No, I can't guarantee that. Linux driver is single threaded, but I > did > nothing to pin in to a certain CPU. > Ok, it was just to understand. > > > > > - In total there are 6 vcpus active > > > > > > I run test in DomU: > > > real 113.08 > > > user 0.00 > > > sys 113.04 > > > > > > > Ok, so there's contention for pCPUs. Dom0's vCPUs are CPU hogs, > > while, > > if my assumption above is correct, the "SMC vCPU" of the DomU is > > I/O > > bound, in the sense that it blocks on an operation --which turns > > out to > > be SMC call to MiniOS-- then resumes and block again almost > > immediately. > > > > Since you are using Credit, can you try to disable context switch > > rate > > limiting? Something like: > > > > # xl sched-credit -s -r 0 > > > > should work. > > Yep. You are right. In the environment described above (Case 2) I now > get much better results: > > real 1.85 > user 0.00 > sys 1.85 > Ok, glad to hear it worked! :-) > > This looks to me like one of those typical scenario where rate > > limiting > > is counterproductive. In fact, every time that your SMC vCPU is > > woken > > up, despite being boosted, it finds all the pCPUs busy, and it > > can't > > preempt any of the vCPUs that are running there, until rate > > limiting > > expires. > > > > That means it has to wait an interval of time that varies between 0 > > and > > 1ms. This happens 100000 times, and 1ms*100000 is 100 seconds... > > Which > > is roughly how the test takes, in the overcommitted case. > > Yes, looks like that was the case. Does this means that ratelimiting > should be disabled for any domain that is backed up with device > model? > AFAIK, device models are working in the exactly same way. > Rate limiting is a scheduler-wide thing. If it's on, all the context switching rate of all domains is limited. If it's off, none is. We'll have to see when we will have something that is less of a proof- of-concept, but it is very likely that, for your use case, rate- limiting should just be kept disabled (you can do that with a Xen boot time parameter, so that you don't have to issue the command all the times). > > Yes, but it again makes sense. In fact, now there are 3 CPUs in > > Pool-0, > > and all are kept always busy by the the 3 DomU vCPUs running > > endless > > loops. So, when the DomU's SMC vCPU wakes up, has again to wait for > > the > > rate limit to expire on one of them. > > Yes, as this was caused by ratelimit, this makes perfect sense. Thank > you. > > I tried number of different cases. Now execution time depends > linearly > on number of over-committed vCPUs (about +200ms for every busy vCPU). > That is what I'm expected. > Is this the case even when MiniOS is in its own cpupool? If yes, it means that what is that the slowdown is caused by the contention between the vCPU that is doing the SMC calls, and the other vCPUs (of either the same or other domains). Which should not really happen in this case (or, at least, not to grow linearly), since you are on Credit1, and in there, the SMC vCPU should pretty much be always boosted, and hence get to be scheduled almost immediately, no matter how many CPU hogs there are around. Depending on the specific details of your usecase/product, we can try to assign to the various domains different weights... but I need to think a bit more about this... Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-19 18:36 ` Volodymyr Babchuk 2017-06-20 10:11 ` Dario Faggioli @ 2017-06-20 10:45 ` Julien Grall 2017-06-20 16:23 ` Volodymyr Babchuk 1 sibling, 1 reply; 49+ messages in thread From: Julien Grall @ 2017-06-20 10:45 UTC (permalink / raw) To: Volodymyr Babchuk, Stefano Stabellini Cc: Artem_Mygaiev, Dario Faggioli, xen-devel, Andrii Anisov, George Dunlap On 06/19/2017 07:36 PM, Volodymyr Babchuk wrote: > Hi Stefano, Hi, > On 19 June 2017 at 10:54, Stefano Stabellini <sstabellini@kernel.org> wrote: > >>> But given the conversation so far, it seems likely that that is mainly >>> due to the fact that context switching on ARM has not been optimized. >> >> True. However, Volodymyr took the time to demonstrate the performance of >> EL0 apps vs. stubdoms with a PoC, which is much more than most Xen >> contributors do. Nodoby provided numbers for a faster ARM context switch >> yet. I don't know on whom should fall the burden of proving that a >> lighter context switch can match the EL0 app numbers. I am not sure it >> would be fair to ask Volodymyr to do it. > Thanks. Actually, we discussed this topic internally today. Main > concern today is not a SMCs and OP-TEE (I will be happy to do this > right in XEN), but vcopros and GPU virtualization. Because of legal > issues, we can't put this in XEN. And because of vcpu framework nature > we will need multiple calls to vgpu driver per one vcpu context > switch. > I'm going to create worst case scenario, where multiple vcpu are > active and there are no free pcpu, to see how credit or credit2 > scheduler will call my stubdom. > Also, I'm very interested in Julien's idea about stubdom without GIC. > Probably, I'll try to hack something like that to see how it will > affect overall switching latency This can only work if your stubdomain does not require interrupt. However, if you are dealing with devices you likely need interrupts, am I correct? The problem would be the same with an EL0 app. Cheers. -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-20 10:45 ` Julien Grall @ 2017-06-20 16:23 ` Volodymyr Babchuk 2017-06-21 10:38 ` Julien Grall 0 siblings, 1 reply; 49+ messages in thread From: Volodymyr Babchuk @ 2017-06-20 16:23 UTC (permalink / raw) To: Julien Grall Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, Dario Faggioli, George Dunlap, Stefano Stabellini Hi Julien, On 20 June 2017 at 03:45, Julien Grall <julien.grall@arm.com> wrote: >> On 19 June 2017 at 10:54, Stefano Stabellini <sstabellini@kernel.org> >> wrote: >> >>>> But given the conversation so far, it seems likely that that is mainly >>>> due to the fact that context switching on ARM has not been optimized. >>> >>> >>> True. However, Volodymyr took the time to demonstrate the performance of >>> EL0 apps vs. stubdoms with a PoC, which is much more than most Xen >>> contributors do. Nodoby provided numbers for a faster ARM context switch >>> yet. I don't know on whom should fall the burden of proving that a >>> lighter context switch can match the EL0 app numbers. I am not sure it >>> would be fair to ask Volodymyr to do it. >> >> Thanks. Actually, we discussed this topic internally today. Main >> concern today is not a SMCs and OP-TEE (I will be happy to do this >> right in XEN), but vcopros and GPU virtualization. Because of legal >> issues, we can't put this in XEN. And because of vcpu framework nature >> we will need multiple calls to vgpu driver per one vcpu context >> switch. >> I'm going to create worst case scenario, where multiple vcpu are >> active and there are no free pcpu, to see how credit or credit2 >> scheduler will call my stubdom. >> Also, I'm very interested in Julien's idea about stubdom without GIC. >> Probably, I'll try to hack something like that to see how it will >> affect overall switching latency > > This can only work if your stubdomain does not require interrupt. However, > if you are dealing with devices you likely need interrupts, am I correct? Ah yes, you are correct. I thought about OP-TEE use case, when there are no interrupts. In case of co-processor virtualization we probably will need interrupts. > The problem would be the same with an EL0 app. In case of EL0 there will be no problem, because EL0 can't handle interrupts :) XEN should receive interrupt and invoke app. Yes, this is another problem with apps, if we want to use them as devices drivers. -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-20 16:23 ` Volodymyr Babchuk @ 2017-06-21 10:38 ` Julien Grall 0 siblings, 0 replies; 49+ messages in thread From: Julien Grall @ 2017-06-21 10:38 UTC (permalink / raw) To: Volodymyr Babchuk Cc: Artem_Mygaiev, xen-devel, Andrii Anisov, Dario Faggioli, George Dunlap, Stefano Stabellini On 20/06/17 17:23, Volodymyr Babchuk wrote: > Hi Julien, Hi Volodymyr, > > On 20 June 2017 at 03:45, Julien Grall <julien.grall@arm.com> wrote: >>> On 19 June 2017 at 10:54, Stefano Stabellini <sstabellini@kernel.org> >>> wrote: >>> >>>>> But given the conversation so far, it seems likely that that is mainly >>>>> due to the fact that context switching on ARM has not been optimized. >>>> >>>> >>>> True. However, Volodymyr took the time to demonstrate the performance of >>>> EL0 apps vs. stubdoms with a PoC, which is much more than most Xen >>>> contributors do. Nodoby provided numbers for a faster ARM context switch >>>> yet. I don't know on whom should fall the burden of proving that a >>>> lighter context switch can match the EL0 app numbers. I am not sure it >>>> would be fair to ask Volodymyr to do it. >>> >>> Thanks. Actually, we discussed this topic internally today. Main >>> concern today is not a SMCs and OP-TEE (I will be happy to do this >>> right in XEN), but vcopros and GPU virtualization. Because of legal >>> issues, we can't put this in XEN. And because of vcpu framework nature >>> we will need multiple calls to vgpu driver per one vcpu context >>> switch. >>> I'm going to create worst case scenario, where multiple vcpu are >>> active and there are no free pcpu, to see how credit or credit2 >>> scheduler will call my stubdom. >>> Also, I'm very interested in Julien's idea about stubdom without GIC. >>> Probably, I'll try to hack something like that to see how it will >>> affect overall switching latency >> >> This can only work if your stubdomain does not require interrupt. However, >> if you are dealing with devices you likely need interrupts, am I correct? > Ah yes, you are correct. I thought about OP-TEE use case, when there > are no interrupts. In case of co-processor virtualization we probably > will need interrupts. > >> The problem would be the same with an EL0 app. > In case of EL0 there will be no problem, because EL0 can't handle > interrupts :) XEN should receive interrupt and invoke app. Yes, this > is another problem with apps, if we want to use them as devices > drivers. Well, this is a bit more complex than that. When you receive an interrupt Xen may run a vCPU that will not use that app. So you have to ensure the time will not get accounted for it. The more I read the discussion, the more I think we should look at optimizing the stubdom case. Xen EL0 should only be used for tiny emulation for a given domain. Otherwise you end up to re-invent the domain. Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-19 9:37 ` George Dunlap 2017-06-19 17:54 ` Stefano Stabellini @ 2017-06-19 18:26 ` Volodymyr Babchuk 2017-06-20 10:00 ` Dario Faggioli 1 sibling, 1 reply; 49+ messages in thread From: Volodymyr Babchuk @ 2017-06-19 18:26 UTC (permalink / raw) To: George Dunlap Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, Dario Faggioli, Julien Grall, xen-devel Hi George, On 19 June 2017 at 02:37, George Dunlap <george.dunlap@citrix.com> wrote: >>>>> There is no way out: if the stubdom needs events, then we'll have to >>>>> expose and context switch the vGIC. If it doesn't, then we can skip the >>>>> vGIC. However, we would have a similar problem with EL0 apps: I am >>>>> assuming that EL0 apps don't need to handle interrupts, but if they do, >>>>> then they might need something like a vGIC. >>>> Hm. Correct me, but if we want make stubdom to handle some requests >>>> (e.g. emulate MMIO access), then it needs events, and thus it needs >>>> interrupts. At least, I'm not aware about any other mechanism, that >>>> allows hypervisor to signal to a domain. >>>> On other hand, EL0 app (as I see them) does not need such events. >>>> Basically, you just call function `handle_mmio()` right in the app. >>>> So, apps can live without interrupts and they still be able to handle >>>> request. >>> >>> So remember that "interrupt" and "event" are basically the same as >>> "structured callback". When anything happens that Xen wants to tell the >>> EL0 app about, it has to have a way of telling it. If the EL0 app is >>> handling a device, it has to have some way of getting interrupts from >>> that device; if it needs to emulate devices sent to the guest, it needs >>> some way to tell Xen to deliver an interrupt to the guest. >> Basically yes. There should be mechanism to request something from >> native application. Question is how this mechanism can be implemented. >> Classical approach is a even-driven loop: >> >> while(1) { >> wait_for_event(); >> handle_event_event(); >> return_back_results(); >> } >> >> wait_for_event() can by anything from WFI instruction to read() on >> socket. This is how stubdoms are working. I agree with you: there are >> no sense to repeat this in native apps. >> >>> Now, we could make the EL0 app interface "interruptless". Xen could >>> write information about pending events in a shared memory region, and >>> the EL0 app could check that before calling some sort of block() >>> hypercall, and check it again when it returns from the block() call. >> >>> But the shared event information starts to look an awful lot like events >>> and/or pending bits on an interrupt controller -- the only difference >>> being that you aren't interrupted if you're already running. >> >> Actually there are third way, which I have used. I described it in >> original email (check out [1]). >> Basically, native application is dead until it is needed by >> hypervisor. When hypervisor wants some services from app, it setups >> parameters, switches mode to EL0 and jumps at app entry point. > > What's the difference between "jumps to an app entry point" and "jumps > to an interrupt handling routine"? "Jumps to an app entry point" and "Unblocks vcpu that waits for an interrupt". That would be more precise. There are two differences: first approach is synchronous, no need to wait scheduler to schedule vcpu. Also vGIC code can be omitted, which decreases switch latency. > And what's the difference between > "Tells Xen about the location of the app entry point" and "tells Xen > about the location of the interrupt handling routine"? There are no difference at all. > If you want this "EL0 app" thing to be able to provide extra security > over just running the code inside of Xen, then the code must not be able > to DoS the host by spinning forever instead of returning. Right. This is a problem. Fortunately, it is running with interrupts enabled, so next timer tick will switch back to XEN. There you can terminate app which is running too long. > What happens if two different pcpus in Xen decide they want to activate > some "app" functionality? There are two possibilities: we can make app single threaded, then second pcpu can be assigned with another vcpu until app is busy. But I don't like this approach. I think that all apps should be multi threaded. They can use simple spinlocks to control access to shared resources. >>> I'm pretty sure you could run in this mode using the existing interfaces >>> if you didn't want the hassle of dealing with asynchrony. If that's the >>> case, then why bother inventing an entirely new interface, with its own >>> bugs and duplication of functionality? Why not just use what we already >>> have? >> Because we are concerned about latency. In my benchmark, my native app >> PoC is 1.6 times faster than stubdom. > > But given the conversation so far, it seems likely that that is mainly > due to the fact that context switching on ARM has not been optimized. Yes. Question is: can context switching in ARM be optimized more? I don't know. > Just to be clear -- I'm not adamantly opposed to a new interface similar > to what you're describing above. But I would be opposed to introducing > a new interface that doesn't achieve the stated goals (more secure, &c), > or a new interface that is the same as the old one but rewritten a bit. > > The point of having this design discussion up front is to prevent a > situation where you spend months coding up something which is ultimately > rejected. There are a lot of things that are hard to predict until > there's actually code to review, but at the moment the "jumps to an > interrupt handling routine" approach looks unpromising. Yes, I'm agree with you. This is why I started those mail threads in the first place. Actually, after all that discussions I stick more to some sort of lightweight domain-bound stubdoms (without vGICs, for example). But I want to discuss all possibilities, including native apps. Actually, what we really need right now is a hard numbers. I did one benchmark, but that was ideal use case. I'm going to do more experiments: with 1 or 1.5 active vcpu per pcpu, with p2m context switch stripped off, etc. -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-19 18:26 ` Volodymyr Babchuk @ 2017-06-20 10:00 ` Dario Faggioli 2017-06-20 10:30 ` George Dunlap 0 siblings, 1 reply; 49+ messages in thread From: Dario Faggioli @ 2017-06-20 10:00 UTC (permalink / raw) To: Volodymyr Babchuk, George Dunlap Cc: Artem_Mygaiev, Julien Grall, Stefano Stabellini, Andrii Anisov, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1268 bytes --] On Mon, 2017-06-19 at 11:26 -0700, Volodymyr Babchuk wrote: > On 19 June 2017 at 02:37, George Dunlap <george.dunlap@citrix.com> > wrote: > > If you want this "EL0 app" thing to be able to provide extra > > security > > over just running the code inside of Xen, then the code must not be > > able > > to DoS the host by spinning forever instead of returning. > > Right. This is a problem. Fortunately, it is running with interrupts > enabled, so next timer tick will switch back to XEN. There you can > terminate app which is running too long. > What timer tick? Xen does not have one. A scheduler may setup one, if it's necessary for its own purposes, but that's entirely optional. For example, Credit does have one; Credit2, RTDS and null do not. Basically, (one of the) main purposes of this new "EL0 app mechanism" is playing behind the scheduler back. Well, fine, but then you're not allowed to assume that the scheduler will rescue you if something goes wrong. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-06-20 10:00 ` Dario Faggioli @ 2017-06-20 10:30 ` George Dunlap 0 siblings, 0 replies; 49+ messages in thread From: George Dunlap @ 2017-06-20 10:30 UTC (permalink / raw) To: Dario Faggioli, Volodymyr Babchuk Cc: Artem_Mygaiev, Julien Grall, Stefano Stabellini, Andrii Anisov, xen-devel On 20/06/17 11:00, Dario Faggioli wrote: > On Mon, 2017-06-19 at 11:26 -0700, Volodymyr Babchuk wrote: >> On 19 June 2017 at 02:37, George Dunlap <george.dunlap@citrix.com> >> wrote: >>> If you want this "EL0 app" thing to be able to provide extra >>> security >>> over just running the code inside of Xen, then the code must not be >>> able >>> to DoS the host by spinning forever instead of returning. >> >> Right. This is a problem. Fortunately, it is running with interrupts >> enabled, so next timer tick will switch back to XEN. There you can >> terminate app which is running too long. >> > What timer tick? Xen does not have one. A scheduler may setup one, if > it's necessary for its own purposes, but that's entirely optional. For > example, Credit does have one; Credit2, RTDS and null do not. > > Basically, (one of the) main purposes of this new "EL0 app mechanism" > is playing behind the scheduler back. Well, fine, but then you're not > allowed to assume that the scheduler will rescue you if something goes > wrong. Well another possibility would be to add "timeouts" to "calls" into the el0 app: i.e., part of the calling mechanism itself would be to set a timer to come back into Xen and fail the call. But what to do if you fail? You could just stop executing the "app", but there's no telling what state its memory will be in, nor any device it's using. It's probably not safe to continue using. Do you crash it? Restart it? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-19 19:45 ` Volodymyr Babchuk 2017-05-22 21:41 ` Stefano Stabellini @ 2017-05-23 7:11 ` Dario Faggioli 2017-05-26 20:09 ` Volodymyr Babchuk 2017-05-23 9:08 ` George Dunlap 2 siblings, 1 reply; 49+ messages in thread From: Dario Faggioli @ 2017-05-23 7:11 UTC (permalink / raw) To: Volodymyr Babchuk, Stefano Stabellini Cc: Artem_Mygaiev, Julien Grall, xen-devel, Andrii Anisov, George Dunlap [-- Attachment #1.1: Type: text/plain, Size: 1935 bytes --] On Fri, 2017-05-19 at 22:45 +0300, Volodymyr Babchuk wrote: > On 18 May 2017 at 22:00, Stefano Stabellini <sstabellini@kernel.org> > wrote: > > ACTIONS: > > Improve the null scheduler to enable decent stubdoms scheduling on > > latency sensitive systems. > > I'm not very familiar with XEN schedulers. > Feel free to ask anything. :-) > Looks like null scheduler > is good for hard RT, but isn't fine for a generic consumer system. > The null scheduler is meant at being useful when you have a static scenario, no (or very few) overbooking (i.e., total nr of vCPUs ~= nr of pCPUS), and what to cut to _zero_ the scheduling overhead. That may include certain class of real-time workloads, but it not limited to such use case. > How > do you think: is it possible to make credit2 scheduler to schedule > stubdoms in the same way? > It is indeed possible. Actually, it's actually in the plans to do exactly something like that, as it could potentially be useful for a wide range of use cases. Doing it in the null scheduler is just easier, and we think it would be a nice way to quickly have a proof of concept done. Afterwards, we'll focus on other schedulers too. > > Investigate ways to improve context switch times on ARM. > > Do you have any tools to profile or trace XEN core? Also, I don't > think that pure context switch time is the biggest issue. Even now, > it > allows 180 000 switches per second (if I'm not wrong). I think, > scheduling latency is more important. > What do you refer to when you say 'scheduling latency'? As in, the latency between which events, happening on which component? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-23 7:11 ` Dario Faggioli @ 2017-05-26 20:09 ` Volodymyr Babchuk 2017-05-27 2:10 ` Dario Faggioli 0 siblings, 1 reply; 49+ messages in thread From: Volodymyr Babchuk @ 2017-05-26 20:09 UTC (permalink / raw) To: Dario Faggioli Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, George Dunlap, Julien Grall, xen-devel Hello Dario, >> I'm not very familiar with XEN schedulers. > Feel free to ask anything. :-) I'm so unfamiliar, so even don't know what to ask :) But thank you. Surely I'll have questions. >> Looks like null scheduler >> is good for hard RT, but isn't fine for a generic consumer system. >> > The null scheduler is meant at being useful when you have a static > scenario, no (or very few) overbooking (i.e., total nr of vCPUs ~= nr > of pCPUS), and what to cut to _zero_ the scheduling overhead. > > That may include certain class of real-time workloads, but it not > limited to such use case. Can't I achieve the same with any other scheduler by pining one vcpu to one pcpu? >> How >> do you think: is it possible to make credit2 scheduler to schedule >> stubdoms in the same way? >> > It is indeed possible. Actually, it's actually in the plans to do > exactly something like that, as it could potentially be useful for a > wide range of use cases. > > Doing it in the null scheduler is just easier, and we think it would be > a nice way to quickly have a proof of concept done. Afterwards, we'll > focus on other schedulers too. > >> Do you have any tools to profile or trace XEN core? Also, I don't >> think that pure context switch time is the biggest issue. Even now, >> it >> allows 180 000 switches per second (if I'm not wrong). I think, >> scheduling latency is more important. >> > What do you refer to when you say 'scheduling latency'? As in, the > latency between which events, happening on which component? I'm worried about interval between task switching events. For example: vcpu1 is vcpu of some domU and vcpu2 is vcpu of stubdom that runs device emulator for domU. vcpu1 issues MMIO access that should be handled by vcpu2 and gets blocked by hypervisor. Then there will be two context switches: vcpu1->vcpu2 to emulate that MMIO access and vcpu2->vcpu1 to continue work. AFAIK, credit2 does not guarantee that vcpu2 will be scheduled right after when vcpu1 will be blocked. It can schedule some vcpu3, then vcpu4 and only then come back to vcpu2. That time interval between event "vcpu2 was made runable" and event "vcpu2 was scheduled on pcpu" is what I call 'scheduling latency'. This latency can be minimized by mechanism similar to priority inheritance: if scheduler knows that vcpu1 waits for vcpu2 and there are remaining time slice for vcpu1 it should select vcpu2 as next scheduled vcpu. Problem is how to populate such dependencies. -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-26 20:09 ` Volodymyr Babchuk @ 2017-05-27 2:10 ` Dario Faggioli 0 siblings, 0 replies; 49+ messages in thread From: Dario Faggioli @ 2017-05-27 2:10 UTC (permalink / raw) To: Volodymyr Babchuk Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, George Dunlap, Julien Grall, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 4448 bytes --] On Fri, 2017-05-26 at 13:09 -0700, Volodymyr Babchuk wrote: > Hello Dario, > Hi, > > Feel free to ask anything. :-) > > I'm so unfamiliar, so even don't know what to ask :) But thank you. > Surely I'll have questions. > Sure. As soon as you have one, go ahead with it. > > The null scheduler is meant at being useful when you have a static > > scenario, no (or very few) overbooking (i.e., total nr of vCPUs ~= > > nr > > of pCPUS), and what to cut to _zero_ the scheduling overhead. > > > > That may include certain class of real-time workloads, but it not > > limited to such use case. > > Can't I achieve the same with any other scheduler by pining one vcpu > to one pcpu? > Of course you can, but not with the same (small!!) level of overhead of the null scheduler. In fact, even if you do 1-to-1 pinning of all the vcpus, the general purpose scheduler (like Credit1 and Credit2) can't rely on assumptions that something like that is indeed in effect, and that it will always be. For instance, if you have all vcpus except one pinned to 1 pCPU. That one missing vcpu, in its turn, can run everywhere. The scheduler has to always go and see which vcpu is the one that is free to run everywhere, and whether it should (for instance) preempt any (and, if yes, which) of the pinned ones. Also, still in those scheduler, there may be multiple vcpus that are pinned to the same pCPU. In which case, the scheduler, at each scheduling decision, needs to figure out which ones (among all the vcpus) they are, and which one has the right to run on the pCPU. And, unfortunately, since pinning can change 100% asynchronously wrt the scheduler, it's really not possible to either make assumptions, nor even to try to capture some (special case) situation in a data structure. Therefore, yes, if you configure 1-to-1 pinning in Credit1 or Credit2, the actual schedule would be the same. But that will be achieve with almost the same computational overhead, as if the vcpus were free. OTOH, the null scheduler is specifically designed for the (semi-)static 1-to-1 pinning use case, so the overhead it introduces (for making scheduling decisions) is close to zero. > > > Do you have any tools to profile or trace XEN core? Also, I don't > > > think that pure context switch time is the biggest issue. Even > > > now, > > > it > > > allows 180 000 switches per second (if I'm not wrong). I think, > > > scheduling latency is more important. > > > > > > > What do you refer to when you say 'scheduling latency'? As in, the > > latency between which events, happening on which component? > > I'm worried about interval between task switching events. > For example: vcpu1 is vcpu of some domU and vcpu2 is vcpu of stubdom > that runs device emulator for domU. > vcpu1 issues MMIO access that should be handled by vcpu2 and gets > blocked by hypervisor. Then there will be two context switches: > vcpu1->vcpu2 to emulate that MMIO access and vcpu2->vcpu1 to continue > work. AFAIK, credit2 does not guarantee that vcpu2 will be scheduled > right after when vcpu1 will be blocked. It can schedule some vcpu3, > then vcpu4 and only then come back to vcpu2. That time interval > between event "vcpu2 was made runable" and event "vcpu2 was scheduled > on pcpu" is what I call 'scheduling latency'. > Yes, currently, that's true. Basically, from the scheduling point of view, there's no particular relationship between a domain's vcpu, and the vcpu of the driver/stub-dom that service the domain itself. But there's a plan to change that, as both I and Stefano said already, and do something in all schedulers. We'll just start with null, because it's the easiest. :-) > This latency can be minimized by mechanism similar to priority > inheritance: if scheduler knows that vcpu1 waits for vcpu2 and there > are remaining time slice for vcpu1 it should select vcpu2 as next > scheduled vcpu. Problem is how to populate such dependencies. > I've spent my PhD studying and doing stuff around priority inheritance... so something similar to that, is exactly what I had in mind. :-D Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-19 19:45 ` Volodymyr Babchuk 2017-05-22 21:41 ` Stefano Stabellini 2017-05-23 7:11 ` Dario Faggioli @ 2017-05-23 9:08 ` George Dunlap 2017-05-26 19:43 ` Volodymyr Babchuk 2 siblings, 1 reply; 49+ messages in thread From: George Dunlap @ 2017-05-23 9:08 UTC (permalink / raw) To: Volodymyr Babchuk, Stefano Stabellini Cc: Artem_Mygaiev, Dario Faggioli, xen-devel, Andrii Anisov, Julien Grall On 19/05/17 20:45, Volodymyr Babchuk wrote: > Hi Stefano, > > On 18 May 2017 at 22:00, Stefano Stabellini <sstabellini@kernel.org> wrote: > >> Description of the problem: need for a place to run emulators and >> mediators outside of Xen, with low latency. >> >> Explanation of what EL0 apps are. What should be their interface with >> Xen? Could the interface be the regular hypercall interface? In that >> case, what's the benefit compared to stubdoms? > I imagined this as separate syscall interface (with finer policy > rules). But this can be discussed, of course. I think that's a natural place to start. But then you start thinking about the details: this thing needs to be able to manage its own address space, send and receive event channels / interrupts, &c &c -- and it actually ends up looking exactly like a subset of what a stubdomain can already do. In which case -- why invent a new interface, instead of just reusing the existing one? >> The problem with stubdoms is latency and scheduling. It is not >> deterministic. We could easily improve the null scheduler to introduce >> some sort of non-preemptive scheduling of stubdoms on the same pcpus of >> the guest vcpus. It would still require manually pinning vcpus to pcpus. > I see couple of other problems with stubdoms. For example, we need > mechanism to load mediator stubdom before dom0. There are a couple of options here. You could do something like the Xoar project [1] did, and have Xen boot a special-purpose "system builder" domain, which would start both the mediator and then a dom0. Or you could have a mechanism for passing more than one domain / initrd to Xen, and pass Xen both the mediator stubdom as well as the kernel for dom0. [1] tjd.phlegethon.org/words/sosp11-xoar.pdf >> Then, we could add a sched_op hypercall to let the schedulers know that >> a stubdom is tied to a specific guest domain. > What if one stubdom serves multiple domains? This is TEE use case. Then you don't make that hypercall. :-) In any case you certainly can't use an EL0 app for that, at least the way we've been describing it. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-23 9:08 ` George Dunlap @ 2017-05-26 19:43 ` Volodymyr Babchuk 2017-05-26 19:46 ` Volodymyr Babchuk 0 siblings, 1 reply; 49+ messages in thread From: Volodymyr Babchuk @ 2017-05-26 19:43 UTC (permalink / raw) To: George Dunlap Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, Dario Faggioli, Julien Grall, xen-devel Hi Dario, >>> Explanation of what EL0 apps are. What should be their interface with >>> Xen? Could the interface be the regular hypercall interface? In that >>> case, what's the benefit compared to stubdoms? >> I imagined this as separate syscall interface (with finer policy >> rules). But this can be discussed, of course. > > I think that's a natural place to start. But then you start thinking > about the details: this thing needs to be able to manage its own address > space, send and receive event channels / interrupts, &c &c -- and it > actually ends up looking exactly like a subset of what a stubdomain can > already do. Actually, I don't want it to handle events, interrupts and such. I see it almost as a synchronous function call. For example. when you need something from it, you don't fire interrupt into it. You just set function number in r0, set parameters in r1-r7, set PC to an entry point and you are good to go. > In which case -- why invent a new interface, instead of just reusing the > existing one? Hypercalls (from domains) and syscalls (from apps) are intersecting sets, but neither is subset for other. One can merge them, but then there will be calls that have meaning only for apps and there will be calls that are fine only for domains. Honestly, I have no strong opinion, which approach is better. I see pros and cons for every variant. >>> The problem with stubdoms is latency and scheduling. It is not >>> deterministic. We could easily improve the null scheduler to introduce >>> some sort of non-preemptive scheduling of stubdoms on the same pcpus of >>> the guest vcpus. It would still require manually pinning vcpus to pcpus. >> I see couple of other problems with stubdoms. For example, we need >> mechanism to load mediator stubdom before dom0. > > There are a couple of options here. You could do something like the > Xoar project [1] did, and have Xen boot a special-purpose "system > builder" domain, which would start both the mediator and then a dom0. Wow. That's very interesting idea. >>> Then, we could add a sched_op hypercall to let the schedulers know that >>> a stubdom is tied to a specific guest domain. >> What if one stubdom serves multiple domains? This is TEE use case. > Then you don't make that hypercall. :-) In any case you certainly can't > use an EL0 app for that, at least the way we've been describing it. That depends on how many right you will give to an EL0 app. I think, it is possible to use it for this purpose. But actually, I'd like to see TEE mediator right in hypervisor. -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Notes on stubdoms and latency on ARM 2017-05-26 19:43 ` Volodymyr Babchuk @ 2017-05-26 19:46 ` Volodymyr Babchuk 0 siblings, 0 replies; 49+ messages in thread From: Volodymyr Babchuk @ 2017-05-26 19:46 UTC (permalink / raw) To: George Dunlap Cc: Artem_Mygaiev, Stefano Stabellini, Andrii Anisov, Dario Faggioli, Julien Grall, xen-devel On 26 May 2017 at 12:43, Volodymyr Babchuk <vlad.babchuk@gmail.com> wrote: > Hi Dario, > Oops, sorry, George. There was two emails in a row: yours one and Dario's one. And I overlooked to whom I'm answering. -- WBR Volodymyr Babchuk aka lorc [+380976646013] mailto: vlad.babchuk@gmail.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2017-07-20 9:25 UTC | newest] Thread overview: 49+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-05-18 19:00 Notes on stubdoms and latency on ARM Stefano Stabellini 2017-05-19 19:45 ` Volodymyr Babchuk 2017-05-22 21:41 ` Stefano Stabellini 2017-05-26 19:28 ` Volodymyr Babchuk 2017-05-30 17:29 ` Stefano Stabellini 2017-05-30 17:33 ` Julien Grall 2017-06-01 10:28 ` Julien Grall 2017-06-17 0:17 ` Volodymyr Babchuk 2017-05-31 9:09 ` George Dunlap 2017-05-31 15:53 ` Dario Faggioli 2017-05-31 16:17 ` Volodymyr Babchuk 2017-05-31 17:45 ` Stefano Stabellini 2017-06-01 10:48 ` Julien Grall 2017-06-01 10:52 ` George Dunlap 2017-06-01 10:54 ` George Dunlap 2017-06-01 12:40 ` Dario Faggioli 2017-06-01 15:02 ` George Dunlap 2017-06-01 18:27 ` Stefano Stabellini 2017-05-31 17:02 ` George Dunlap 2017-06-17 0:14 ` Volodymyr Babchuk 2017-06-19 9:37 ` George Dunlap 2017-06-19 17:54 ` Stefano Stabellini 2017-06-19 18:36 ` Volodymyr Babchuk 2017-06-20 10:11 ` Dario Faggioli 2017-07-07 15:02 ` Volodymyr Babchuk 2017-07-07 16:41 ` Dario Faggioli 2017-07-07 17:03 ` Volodymyr Babchuk 2017-07-07 21:12 ` Stefano Stabellini 2017-07-12 6:14 ` Dario Faggioli 2017-07-17 9:25 ` George Dunlap 2017-07-17 10:04 ` Julien Grall 2017-07-17 11:28 ` George Dunlap 2017-07-19 11:21 ` Julien Grall 2017-07-20 9:25 ` Dario Faggioli 2017-07-20 9:10 ` Dario Faggioli 2017-07-20 8:49 ` Dario Faggioli 2017-07-08 14:26 ` Dario Faggioli 2017-06-20 10:45 ` Julien Grall 2017-06-20 16:23 ` Volodymyr Babchuk 2017-06-21 10:38 ` Julien Grall 2017-06-19 18:26 ` Volodymyr Babchuk 2017-06-20 10:00 ` Dario Faggioli 2017-06-20 10:30 ` George Dunlap 2017-05-23 7:11 ` Dario Faggioli 2017-05-26 20:09 ` Volodymyr Babchuk 2017-05-27 2:10 ` Dario Faggioli 2017-05-23 9:08 ` George Dunlap 2017-05-26 19:43 ` Volodymyr Babchuk 2017-05-26 19:46 ` Volodymyr Babchuk
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).