* Delays on usleep calls @ 2014-01-20 14:10 Pavlo Suikov 2014-01-20 15:05 ` Dario Faggioli 0 siblings, 1 reply; 13+ messages in thread From: Pavlo Suikov @ 2014-01-20 14:10 UTC (permalink / raw) To: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1437 bytes --] Hi, yet another question on soft real time under Xen. Setup looks like this: Xen: 4.4-rc1 with credit scheduler dom0: Linux 3.8.13 with CFS scheduler domU: Android 4.3 with Linux kernel 3.8.13 with CFS scheduler Test program makes nothing but sleeping for 30 (5, 500) ms then printing timestamp in an endless loop. Results on a guest OS being run without hypervisor are pretty correct, while on a guest OS under a hypervisor (both in dom0 and domU) we observe regular delay of 5-15 ms no matter what sleep time is. Configuring scheduler to different weights for dom0/domU has no effect whatsoever. If setup looks like this (the only change is the Xen scheduler): Xen: 4.4-rc1 with sEDF scheduler dom0: Linux 3.8.13 with CFS scheduler domU: Android 4.3 with Linux kernel 3.8.13 with CFS scheduler we observe the same delay but only in domU; dom0 measurements are far more correct. Can anyone suggest what can be a reason for such a misbehaviour and what can be impacted by it? We came to this test from an incorrect rendering timer work, but it seems that there is an issue on it's own. If anyone can suggest more precise tests, it would be appreciated as well; system activity in the guest OS is the same for tests with and without Xen. Thanks in advance! Suikov Pavlo GlobalLogic P +x.xxx.xxx.xxxx M +38.066.667.1296 S psujkov www.globallogic.com <http://www.globallogic.com/> http://www.globallogic.com/email_disclaimer.txt [-- Attachment #1.2: Type: text/html, Size: 3723 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Delays on usleep calls 2014-01-20 14:10 Delays on usleep calls Pavlo Suikov @ 2014-01-20 15:05 ` Dario Faggioli 2014-01-20 16:05 ` Pavlo Suikov 2014-02-05 21:30 ` Robbie VanVossen 0 siblings, 2 replies; 13+ messages in thread From: Dario Faggioli @ 2014-01-20 15:05 UTC (permalink / raw) To: Pavlo Suikov; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 4254 bytes --] On lun, 2014-01-20 at 16:10 +0200, Pavlo Suikov wrote: > Hi, > Hi again Pavlo, > yet another question on soft real time under Xen. Setup looks like > this: > > > > Xen: 4.4-rc1 with credit scheduler > > dom0: Linux 3.8.13 with CFS scheduler > > domU: Android 4.3 with Linux kernel 3.8.13 with CFS scheduler > x86 or ARM host? If x86, is DomU HVM or PV? Also, how many pCPUs and vCPUs do the host and the various guests have? Are you using any vCPU-to-pCPU pinning? > Test program makes nothing but sleeping for 30 (5, 500) ms then > printing timestamp in an endless loop. > Ok, so something similar to cyclictest, right? https://rt.wiki.kernel.org/index.php/Cyclictest I'm also investigating on running it in a bunch of consideration... I'll have results that we can hopefully compare to yours very soon. What about giving a try to it yourself? I think standardizing on one (a set of) specific tool could be a good thing. > Results on a guest OS being run without hypervisor are pretty correct, > while on a guest OS under a hypervisor (both in dom0 and domU) we > observe regular delay of 5-15 ms no matter what sleep time is. > Configuring scheduler to different weights for dom0/domU has no effect > whatsoever. > Mmm... Can you show us at least part of the numbers? From my experience, it's really easy to mess up with terminology in this domain (delay, latency, jitter, etc.). AFAIUI, you're saying that you're asking for a sleep time of X, and you're being waking up in the interval [x+5ms, x+15ms], is that the case? > If setup looks like this (the only change is the Xen scheduler): > > > > Xen: 4.4-rc1 with sEDF scheduler > dom0: Linux 3.8.13 with CFS scheduler > > domU: Android 4.3 with Linux kernel 3.8.13 with CFS scheduler > > > we observe the same delay but only in domU; dom0 measurements are far > more correct. > It would be nice to know, in addition to the information above (arch, CPUs, pinning, etc), something about how sEDF is being used in this particular case too. Can you post the output of the following commands: # xl list -n # xl vcpu-list # xl sched-sedf That being said, sEDF is quite broken, unless used in a very specific way. Also, even if it would be 100% working, you usecase seems to not need sEDF (or any real-time scheduling solution) _yet_, as it does not include any kind of other load, wrt the usleeping ask in the Dom0 or DomU, is that the case? Or you have some other load in the system while performing these measurements? If the latter, what and where? What I mean is as follows. As far as Xen is concerned, if you have a bunch of VMs, with various and different kind of load running into them, and you want to make sure that one of them receives at least some specific pCPU time (or things like that), then it is important what scheduler you pick in Xen. If you only have, say, Dom0 and one DomU, and you are only running your workload (the usleeping task) inside the DomU, then vCPU scheduling happening in Xen should not make that much difference. Of course, in order to be sure of that, we'd need to know more about the configuration, as I already asked above. Oh, and now that I think about it, something that present in credit and not in sEDF that might be worth checking is the scheduling rate limiting thing. You can fin out more about it in this blog post: http://blog.xen.org/index.php/2012/04/10/xen-4-2-new-scheduler-parameters-2/ Discussion about it started in this thread (and then continued in a few other ones): http://thr3ads.net/xen-devel/2011/10/1129938-PATCH-scheduler-rate-controller In the code, look for ratelimit_us, in particular inside xen/common/sched_credit.c, to figure out even better what this does. It probably isn't the source of your problems (the 'delay' you're seeing looks too big), but since it's quite cheap to check... :-) Let us know. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Delays on usleep calls 2014-01-20 15:05 ` Dario Faggioli @ 2014-01-20 16:05 ` Pavlo Suikov 2014-01-20 17:31 ` Pavlo Suikov 2014-01-21 11:46 ` Dario Faggioli 2014-02-05 21:30 ` Robbie VanVossen 1 sibling, 2 replies; 13+ messages in thread From: Pavlo Suikov @ 2014-01-20 16:05 UTC (permalink / raw) To: Dario Faggioli; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 3138 bytes --] Hi Dario! > x86 or ARM host? ARM. ARMv7, TI Jacinto6 to be precise. > Also, how many pCPUs and vCPUs do the host and the various guests have? 2 pCPUs, 4 vCPUs: 2 vCPU per domain. > # xl list -n > # xl vcpu-list # xl list -n Name ID Mem VCPUs State Time(s) NODE Affinity Domain-0 0 118 2 r----- 20.5 any node android_4.3 1 1024 2 -b---- 383.5 any node # xl vcpu-list Name ID VCPU CPU State Time(s) CPU Affinity Domain-0 0 0 0 r-- 12.6 any cpu Domain-0 0 1 1 b 7.3 any cpu android_4.3 1 0 0 b 180.8 any cpu android_4.3 1 1 0 b 184.4 any cpu > Are you using any vCPU-to-pCPU pinning? No. > What about giving a try to it yourself? I think standardizing on one (a > set of) specific tool could be a good thing. Yep, we'll try it. > Mmm... Can you show us at least part of the numbers? From my experience, > it's really easy to mess up with terminology in this domain (delay, > latency, jitter, etc.). Test with 30 ms sleep period gives such results (I hope formatting won't break): Measurements AverageNumber of times with t > 32 ms CreditDom035832.564245810055972 DomU35840.7625698324022358 sEDFDom035833.6284916201117120 DomU35840.6983240223464358 We did additional measurements and as you can see, my first impression was not very correct: difference between dom0 and domU exist and is quite observable on a larger scale. On the same setup bare metal without Xen number of times t > 32 is close to 0; on the setup with Xen but without domU system running number of times t > 32 is close to 0 as well. We will make additional measurements with Linux (not Android) as a domU guest, though. Raw data looks like this: Credit sedf Dom0DomU Dom0DomU 3846 38393146 39393146 39393946 39463946 39463146 31393146 31393146 31463146 31463146 31393146 31393146 39393146 39393139 31393139 3139 So as you can see, there are peak values both in dom0 and domU. > AFAIUI, you're saying that you're asking for a sleep time of X, and > you're being waking up in the interval [x+5ms, x+15ms], is that the > case? Yes, that's correct. In the numbers above sleep period is 30 ms, so we were expecting 30-31 ms sleep time as it is on the same setup without Xen. > # xl sched-sedf # xl sched-sedf Cpupool Pool-0: Name ID Period Slice Latency Extra Weight Domain-0 0 100 0 0 1 0 android_4.3 1 100 0 0 1 0 > Or you have some other load in the system while > performing these measurements? If the latter, what and where? During this test both guests were almost idle. > Oh, and now that I think about it, something that present in credit and > not in sEDF that might be worth checking is the scheduling rate limiting > thing. We'll check it out, thanks! Regards, Pavlo [-- Attachment #1.2: Type: text/html, Size: 27332 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Delays on usleep calls 2014-01-20 16:05 ` Pavlo Suikov @ 2014-01-20 17:31 ` Pavlo Suikov 2014-01-21 10:56 ` Dario Faggioli 2014-01-21 11:46 ` Dario Faggioli 1 sibling, 1 reply; 13+ messages in thread From: Pavlo Suikov @ 2014-01-20 17:31 UTC (permalink / raw) To: Dario Faggioli; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 3653 bytes --] Hi again, sorry for the broken formatting, see better one below where the tables should be. > x86 or ARM host? > > ARM. ARMv7, TI Jacinto6 to be precise. > > > > Also, how many pCPUs and vCPUs do the host and the various guests have? > > 2 pCPUs, 4 vCPUs: 2 vCPU per domain. > > > > # xl list -n > > # xl vcpu-list > > # xl list -n > Name ID Mem VCPUs State > Time(s) NODE Affinity > Domain-0 0 118 2 r----- > 20.5 any node > android_4.3 1 1024 2 -b---- > 383.5 any node > > # xl vcpu-list > Name ID VCPU CPU State Time(s) CPU > Affinity > Domain-0 0 0 0 r-- 12.6 any cpu > Domain-0 0 1 1 b 7.3 any cpu > android_4.3 1 0 0 b 180.8 any cpu > android_4.3 1 1 0 b 184.4 any cpu > > > Are you using any vCPU-to-pCPU pinning? > > No. > > > > What about giving a try to it yourself? I think standardizing on one (a > > set of) specific tool could be a good thing. > > Yep, we'll try it. > > > > Mmm... Can you show us at least part of the numbers? From my experience, > > it's really easy to mess up with terminology in this domain (delay, > > latency, jitter, etc.). > > Test with 30 ms sleep period gives such results (I hope formatting won't > break): > > Measurements Average Number of times with t > 32 ms Credit Dom0 358 32.5 72 DomU 358 40.7 358 sEDF Dom0 358 33.6 120 DomU 358 40.7 358 We did additional measurements and as you can see, my first impression was not very correct: difference between dom0 and domU exist and is quite observable on a larger scale. On the same setup bare metal without Xen number of times t > 32 is close to 0; on the setup with Xen but without domU system running number of times t > 32 is close to 0 as well. We will make additional measurements with Linux (not Android) as a domU guest, though. > > Raw data looks like this: > Credit: Dom0 DomU 38 46 31 46 31 46 39 46 39 46 31 46 31 39 sEDF: Dom0 DomU 38 39 39 39 39 39 39 46 39 46 31 39 31 39 > > So as you can see, there are peak values both in dom0 and domU. > > > AFAIUI, you're saying that you're asking for a sleep time of X, and > > you're being waking up in the interval [x+5ms, x+15ms], is that the > > case? > > Yes, that's correct. In the numbers above sleep period is 30 ms, so we > were expecting 30-31 ms sleep time as it is on the same setup without Xen. > > > # xl sched-sedf > > # xl sched-sedf > Cpupool Pool-0: > Name ID Period Slice Latency Extra Weight > Domain-0 0 100 0 0 1 0 > android_4.3 1 100 0 0 1 0 > > > > Or you have some other load in the system while > > performing these measurements? If the latter, what and where? > > During this test both guests were almost idle. > > > > Oh, and now that I think about it, something that present in credit and > > not in sEDF that might be worth checking is the scheduling rate limiting > > thing. > > We'll check it out, thanks! > > Regards, Pavlo > [-- Attachment #1.2: Type: text/html, Size: 5250 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Delays on usleep calls 2014-01-20 17:31 ` Pavlo Suikov @ 2014-01-21 10:56 ` Dario Faggioli 0 siblings, 0 replies; 13+ messages in thread From: Dario Faggioli @ 2014-01-21 10:56 UTC (permalink / raw) To: Pavlo Suikov; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 538 bytes --] On lun, 2014-01-20 at 19:31 +0200, Pavlo Suikov wrote: > Hi again, > > > sorry for the broken formatting, see better one below where the tables > should be. > Mmm... At least for me, formatting is broken in this e-mail, and was ok in the former one :-) Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Delays on usleep calls 2014-01-20 16:05 ` Pavlo Suikov 2014-01-20 17:31 ` Pavlo Suikov @ 2014-01-21 11:46 ` Dario Faggioli 2014-01-21 15:53 ` Pavlo Suikov 1 sibling, 1 reply; 13+ messages in thread From: Dario Faggioli @ 2014-01-21 11:46 UTC (permalink / raw) To: Pavlo Suikov; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 5516 bytes --] On lun, 2014-01-20 at 18:05 +0200, Pavlo Suikov wrote: > > x86 or ARM host? > ARM. ARMv7, TI Jacinto6 to be precise. > Ok. > > Also, how many pCPUs and vCPUs do the host and the various guests > have? > > > 2 pCPUs, 4 vCPUs: 2 vCPU per domain. > Right. So you are overbooking the platform a bit. Don't get me wrong, that's not only legitimate, it's actually a good thing, if only because it gives us something nice to play with, from the Xen scheduling perspective. If you were just having #vCPUs==#pCPUs, that would be way more boring! :-) That being said, is that a problem, as a temporary measure, during this first phase of testing and benchmarking, to change it a bit? I'm asking because I think that could help isolating the various causes of the issues you're seeing, and hence facing and resolving them. > > Are you using any vCPU-to-pCPU pinning? > > No. > Ok, so, if, as said above, you can do that, I'd try the following. With the credit scheduler (after having cleared/disabled the rate limiting thing), go for 1 vCPU in Dom0 and 1 vCPU in DomU. Also, pin both, and do it to different pCPUs. I think booting with this "dom0_max_vcpus=1 dom0_vcpus_pin" in the Xen command line would do the trick for Dom0. For DomU, you just put in the config file a "cpus=X" entry, as soon as you see what it is the pCPU to which Dom0 is _not_ pinned (I suspect Dom0 will end up pinned to pCPU #0, and so you should use "cpus=1" for the DomU). With that configuration, repeat the tests. Basically, what I'm asking you to do is to completely kick the Xen scheduler out of the window, for now, to try getting some baseline numbers. Nicely enough, when using only 1 vCPU for both Dom0 and DomU, you pretty much rule out most of the Linux scheduler's logic (not everything, but at least the part about load balancing). To push even harder on the latter, I'd boost the priority of the test program (I'm still talking about inside the Linux guest) to some high level rtprio. What all the above should give you is an estimation of the current lower bound on latency and jitter that you can get. If that's already not good enough (provided I did not make any glaring mistake in the instructions above :-D), then we know that there are areas other than the scheduler that needs some intervention, and we can start looking for which ones and what to do. Also, whether or not what you get is enough, one can also start working on seeing what scheduler, and/or what set of scheduling parameters, is able to replicate, or get close and reliably enough, to the 'static scenario'. What do you think? > We did additional measurements and as you can see, my first impression > was not very correct: difference between dom0 and domU exist and is > quite observable on a larger scale. On the same setup bare metal > without Xen number of times t > 32 is close to 0; on the setup with > Xen but without domU system running number of times t > 32 is close to > 0 as well. > I appreciate that. Given the many actors and factors involved, I think the only way to figure out what's going on is to try isolating the various components as much as we can... That's why I'm suggesting to consider a very very very simple situation first, at least wrt to scheduling. > We will make additional measurements with Linux (not Android) as a > domU guest, though. > Ok. > > # xl sched-sedf > > # xl sched-sedf > Cpupool Pool-0: > Name ID Period Slice Latency Extra > Weight > Domain-0 0 100 0 0 1 > 0 > android_4.3 1 100 0 0 1 > 0 > May I ask for the output of # xl list -n and # xl vcpu-list in the sEDF case too? That being said, I suggest you not to spend much time on sEDF for now. As it is, it's broken, especially on SMPs, so we either re-engineer it properly, or turn toward RT-Xen (and, e.g., help Sisu and his team to upstream it). I think we should have a discussion about the above, outside and beyond this thread... I'll spring it up in the proper way ASAP. > > Oh, and now that I think about it, something that present in credit > and > > not in sEDF that might be worth checking is the scheduling rate > limiting > > thing. > > > We'll check it out, thanks! > Right. One other thing that I forgot to mention: the timeslice. Credit uses, by default, 30ms as its scheduling timeslice which, I think, is quite high for latency sensitive workloads like yours (Linux typically uses 1, 3.33, 4 or 10). # xl sched-credit Cpupool Pool-0: tslice=30ms ratelimit=1000us Name ID Weight Cap Domain-0 0 256 0 vm.guest.osstest 9 256 0 I think that another thing that is worth trying is running the experiments with that lowered a bit. E.g.: # xl sched-credit -s -t 1 # xl sched-credit Cpupool Pool-0: tslice=1ms ratelimit=1000us Name ID Weight Cap Domain-0 0 256 0 vm.guest.osstest 9 256 0 Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Delays on usleep calls 2014-01-21 11:46 ` Dario Faggioli @ 2014-01-21 15:53 ` Pavlo Suikov 2014-01-21 17:56 ` Dario Faggioli 0 siblings, 1 reply; 13+ messages in thread From: Pavlo Suikov @ 2014-01-21 15:53 UTC (permalink / raw) To: Dario Faggioli; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1678 bytes --] Hi, > Ok, so, if, as said above, you can do that, I'd try the following. With > the credit scheduler (after having cleared/disabled the rate limiting > thing), go for 1 vCPU in Dom0 and 1 vCPU in DomU. > Also, pin both, and do it to different pCPUs. I think booting with this > "dom0_max_vcpus=1 dom0_vcpus_pin" in the Xen command line would do the > trick for Dom0. For DomU, you just put in the config file a "cpus=X" > entry, as soon as you see what it is the pCPU to which Dom0 is _not_ > pinned (I suspect Dom0 will end up pinned to pCPU #0, and so you should > use "cpus=1" for the DomU). > With that configuration, repeat the tests. It happened that we cannot start dom0 with one vCPU (investigation on this one is still ongoing), but we succeded in giving one vCPU to domU and pinning it to one of the pCPUs. Interestingly enough, that fixed all the latency observed in domU. # xl vcpu-list Name ID VCPU CPU State Time(s) CPU Affinity Domain-0 0 0 0 --- 38.6 0 Domain-0 0 1 0 r-- 31.8 0 android_4.3 1 0 1 b 230.2 1 In dom0 (which has two vCPUs, so Xen scheduling is actually used) latency is still present. So without virtualization of CPUs for domU soft real time properties with regard to timers are met (and our RTP audio sync is doing much better). That's obviously not a solution, but it shows that Xen credit (and sEDF) scheduling is actually misbehaving on such tasks and there is an area to investigate. I will keep you informed when more results are present, so stay tuned :) Regards, Pavlo [-- Attachment #1.2: Type: text/html, Size: 1980 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Delays on usleep calls 2014-01-21 15:53 ` Pavlo Suikov @ 2014-01-21 17:56 ` Dario Faggioli 2014-01-23 19:09 ` Pavlo Suikov 0 siblings, 1 reply; 13+ messages in thread From: Dario Faggioli @ 2014-01-21 17:56 UTC (permalink / raw) To: Pavlo Suikov; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 2842 bytes --] On mar, 2014-01-21 at 17:53 +0200, Pavlo Suikov wrote: > It happened that we cannot start dom0 with one vCPU (investigation on > this one is still ongoing), > Oh, I see. Weird. > but we succeded in giving one vCPU to domU and pinning it to one of > the pCPUs. Interestingly enough, that fixed all the latency observed > in domU. > Can I ask how the numbers (for DomU, of course) looks like now? > # xl vcpu-list > Name ID VCPU CPU State Time(s) CPU > Affinity > Domain-0 0 0 0 --- 38.6 0 > Domain-0 0 1 0 r-- 31.8 0 > android_4.3 1 0 1 b 230.2 1 > > > In dom0 (which has two vCPUs, so Xen scheduling is actually used) > latency is still present. > Did you try reducing the scheduling timeslice to 1ms (or even something bigger, but less than 30)? If yes, down to which value? Another thing I'll try, if you haven't done that already, is as follows: - get rid of the DomU - pin the 2 Dom0's vCPUs each one to one pCPU - repeat the experiment If that works, iterate, without the second step, i.e., basically, run the experiment with no pinning, but only with Dom0. What I'm after is, since you report DomU performance are starting to be satisfying if pinning is used, trying to figure out whether that is the case for Dom0 too, as it should be, or if there is something interfering with that. I know, if the DomU is idle when running the load in Dom0, having it or not should not make much difference, but still, I'd give this a try. > So without virtualization of CPUs for domU soft real time properties > with regard to timers are met (and our RTP audio sync is doing much > better). That's obviously not a solution, but it shows that Xen credit > (and sEDF) scheduling is actually misbehaving on such tasks and there > is an area to investigate. > Yes. Well, I think it says that, at least for your usecase, the latency introduced by virtualization per se (VMEXITs, or whatever is the ARM equivalent for them, etc) are not showstoppers... Which is already something! :-) What we now need to figure out, is with what scheduler(s) and, for that (each) scheduler(s), with what parameters, that property is preserved. And, I agree, this is something to be investigated. > I will keep you informed when more results are present, so stay > tuned :) > Thanks for all the updates so far... I definitely will stay tuned. :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Delays on usleep calls 2014-01-21 17:56 ` Dario Faggioli @ 2014-01-23 19:09 ` Pavlo Suikov 2014-01-24 17:08 ` Dario Faggioli 0 siblings, 1 reply; 13+ messages in thread From: Pavlo Suikov @ 2014-01-23 19:09 UTC (permalink / raw) To: Dario Faggioli; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 941 bytes --] Hi Dario, > Can I ask how the numbers (for DomU, of course) looks like now? They are all 31 ms, so minimal overhead is achieved. However, it looks like we still have some gremlins in there: from boot to boot this time can change to 39 ms. So without Xen scheduler being active sleep latency stabilizes, but not always on a correct value. > Another thing I'll try, if you haven't done that already, is as follows: > - get rid of the DomU > - pin the 2 Dom0's vCPUs each one to one pCPU > - repeat the experiment Yeah, we have tried this as well and it gives us almost the same result as in previous case: sleep latency in dom0 is not present, so we get 30 to 31 ms on 30 ms sleep without any variations. > If that works, iterate, without the second step, i.e., basically, run > the experiment with no pinning, but only with Dom0. Still good results. Problems are showing only when we have domU actually being running. Regards, Pavlo [-- Attachment #1.2: Type: text/html, Size: 1129 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Delays on usleep calls 2014-01-23 19:09 ` Pavlo Suikov @ 2014-01-24 17:08 ` Dario Faggioli 0 siblings, 0 replies; 13+ messages in thread From: Dario Faggioli @ 2014-01-24 17:08 UTC (permalink / raw) To: Pavlo Suikov; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1839 bytes --] On gio, 2014-01-23 at 21:09 +0200, Pavlo Suikov wrote: > Hi Dario, > > > Can I ask how the numbers (for DomU, of course) looks like now? > > > They are all 31 ms, so minimal overhead is achieved. However, it looks > like we still have some gremlins in there: from boot to boot this time > can change to 39 ms. So without Xen scheduler being active sleep > latency stabilizes, but not always on a correct value. > Wow... So, you're saying that, with the DomU exclusively pinned to a specific pCPU, latency is stable, but the value at which it stabilizes varies from boot to boot? That's very weird... > > Another thing I'll try, if you haven't done that already, is as > follows: > > - get rid of the DomU > > - pin the 2 Dom0's vCPUs each one to one pCPU > > - repeat the experiment > > > Yeah, we have tried this as well and it gives us almost the same > result as in previous case: sleep latency in dom0 is not present, so > we get 30 to 31 ms on 30 ms sleep without any variations. > Ok. Are you aware of xentrace and xenalyze? Have, for example, a look here: http://blog.xen.org/index.php/2012/09/27/tracing-with-xentrace-and-xenalyze/ Perhaps you could, in this Dom0 case, you can start the tracing, start the DomU and then run the experiments. It's going to be though to correlate the actual activity in Dom0 (the test running inside it), with the Xen's traces, but at least you should be able to see when, and try to figure out why, the DomU, which should be just idle, ends up getting in your way. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Delays on usleep calls 2014-01-20 15:05 ` Dario Faggioli 2014-01-20 16:05 ` Pavlo Suikov @ 2014-02-05 21:30 ` Robbie VanVossen 2014-02-07 9:22 ` Dario Faggioli 1 sibling, 1 reply; 13+ messages in thread From: Robbie VanVossen @ 2014-02-05 21:30 UTC (permalink / raw) To: Dario Faggioli; +Cc: Pavlo Suikov, Nate Studer, xen-devel On 1/20/2014 10:05 AM, Dario Faggioli wrote: >> Test program makes nothing but sleeping for 30 (5, 500) ms then >> printing timestamp in an endless loop. >> > Ok, so something similar to cyclictest, right? > > https://rt.wiki.kernel.org/index.php/Cyclictest > > I'm also investigating on running it in a bunch of consideration... I'll > have results that we can hopefully compare to yours very soon. > > What about giving a try to it yourself? I think standardizing on one (a > set of) specific tool could be a good thing. Dario, We thought we would try to get some similar readings for the Arinc653 scheduler. We followed your suggestions from this thread and have gotten some readings for the following configurations: ---------------- Configuration 1 - Only Domain-0 Xen: 4.4-rc2 - Arinc653 Scheduler Domain-0: Ubuntu 12.04.1 - Linux 3.2.0-35 xl list -n: Name ID Mem VCPUs State Time(s) NODE Affinity Domain-0 0 1535 1 r----- 100.2 all xl vcpu-list: Name ID VCPU CPU State Time(s) CPU Affinity Domain-0 0 0 0 r-- 103.0 all ---------------- Configuration 2 - Domain-0 and Unscheduled guest Xen: 4.4-rc2 - Arinc653 Scheduler Domain-0: Ubuntu 12.04.1 - Linux 3.2.0-35 dom1: Ubuntu 12.04.1 - Linux 3.2.0-35 xl list -n: Name ID Mem VCPUs State Time(s) NODE Affinity Domain-0 0 1535 1 r----- 146.4 all dom1 1 512 1 ------ 0.0 all xl vcpu-list: Name ID VCPU CPU State Time(s) CPU Affinity Domain-0 0 0 0 r-- 147.4 all dom1 1 0 0 --- 0.0 all ---------------- Configuration 3 - Domain-0 and Scheduled guest (In separate CPU Pools) Xen: 4.4-rc2 - Credit Scheduler Pool: Pool-0 - Credit Scheduler Domain-0: Ubuntu 12.04.1 - Linux 3.2.0-35 Pool: arinc - Arinc653 Scheduler dom1: Ubuntu 12.04.1 - Linux 3.2.0-35 xl list -n: Name ID Mem VCPUs State Time(s) NODE Affinity Domain-0 0 1535 2 r----- 111.5 all dom1 1 512 1 b---- 4.3 all xl vcpu-list: Name ID VCPU CPU State Time(s) CPU Affinity Domain-0 0 0 0 -b- 81.0 all Domain-0 0 1 0 r-- 47.1 all dom1 1 0 1 -b- 4.7 all ---------------- We used the following command to get results for a 30 millisecond (30,000us) interval with 500 loops: cyclictest -t1 -i 30000 -l 500 -q Results: +--------+--------+-----------+-------+-------+-------+ | Config | Domain | Scheduler | Latency (us) | | | | | Min | Max | Avg | +--------+--------+-----------+-------+-------+-------+ | 1 | 0 | Arinc653 | 20 | 163 | 68 | | 2 | 0 | Arinc653 | 21 | 173 | 68 | | 3 | 1 | Arinc653 | 20 | 155 | 75 | +--------+--------+-----------+-------+-------+-------+ It looks like we get negligible latencies for each of these simplistic configurations. Thanks, -- --- Robbie VanVossen DornerWorks, Ltd. Embedded Systems Engineering ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Delays on usleep calls 2014-02-05 21:30 ` Robbie VanVossen @ 2014-02-07 9:22 ` Dario Faggioli 2014-02-13 21:09 ` Robbie VanVossen 0 siblings, 1 reply; 13+ messages in thread From: Dario Faggioli @ 2014-02-07 9:22 UTC (permalink / raw) To: Robbie VanVossen; +Cc: Pavlo Suikov, Nate Studer, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 2653 bytes --] On mer, 2014-02-05 at 16:30 -0500, Robbie VanVossen wrote: > On 1/20/2014 10:05 AM, Dario Faggioli wrote: > > What about giving a try to it yourself? I think standardizing on one (a > > set of) specific tool could be a good thing. > > Dario, > Hey! :-) > We thought we would try to get some similar readings for the Arinc653 scheduler. > We followed your suggestions from this thread and have gotten some readings for > the following configurations: > That's cool, thanks for doing this and sharing the results. > We used the following command to get results for a 30 millisecond (30,000us) > interval with 500 loops: > > cyclictest -t1 -i 30000 -l 500 -q > > Results: > > +--------+--------+-----------+-------+-------+-------+ > | Config | Domain | Scheduler | Latency (us) | > | | | | Min | Max | Avg | > +--------+--------+-----------+-------+-------+-------+ > | 1 | 0 | Arinc653 | 20 | 163 | 68 | > | 2 | 0 | Arinc653 | 21 | 173 | 68 | > | 3 | 1 | Arinc653 | 20 | 155 | 75 | > +--------+--------+-----------+-------+-------+-------+ > > It looks like we get negligible latencies for each of these simplistic > configurations. > It looks indeed. You're right, the configuration are simplistic. Yet, as stated before, latency and jitter in scheduling/event response has two major contributors: one is the scheduling algorithm itself, the other is the interrupt/event delivery latency of the platform (HW + HYP + OS). This means that, of course, you need to pick the right scheduler and configure it properly, but there may be other sources of latency and delay, and that is what sets the lowest possible limit, unless you go chasing and fix these 'platform issues'. From your experiments (an from some other numbers I also have) it looks like this lower bound is not terrible in Xen, which is something good to know... So thanks again for taking the time of running the benchmarks and sharing the results! :-D That being said, especially if we compare to baremetal, I think there is some room for improvements (I mean, there always will be an overhead, but still...). Do you, by any chance, have the figures for cyclictest on Linux baremetal too (on the same hardware and kernel, if possible)? Thanks a lot again! Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Delays on usleep calls 2014-02-07 9:22 ` Dario Faggioli @ 2014-02-13 21:09 ` Robbie VanVossen 0 siblings, 0 replies; 13+ messages in thread From: Robbie VanVossen @ 2014-02-13 21:09 UTC (permalink / raw) To: Dario Faggioli; +Cc: Pavlo Suikov, Nate Studer, xen-devel On 2/7/2014 4:22 AM, Dario Faggioli wrote: > From your experiments (an from some other numbers I also have) it looks > like this lower bound is not terrible in Xen, which is something good to > know... So thanks again for taking the time of running the benchmarks > and sharing the results! :-D > > That being said, especially if we compare to baremetal, I think there is > some room for improvements (I mean, there always will be an overhead, > but still...). Do you, by any chance, have the figures for cyclictest on > Linux baremetal too (on the same hardware and kernel, if possible)? Dario, Here is an updated table: +--------+--------+-----------+-----+-------+-----+ | Config | Domain | Scheduler | Latency (us) | +--------+--------+-----------+-----+-------+-----+ | | | | Min | Max | Avg | +--------+--------+-----------+-----+-------+-----+ | 0 | NA | CFS | 4 | 35 | 10 | | 1 | 0 | Arinc653 | 20 | 163 | 68 | | 2 | 0 | Arinc653 | 21 | 173 | 68 | | 3 | 0 | Credit | 23 | 1041 | 87 | | 3 | 1 | Arinc653 | 20 | 155 | 75 | +--------+--------+-----------+-----+-------+-----+ Configuration 0 is the same kernel as before, but running on baremetal, as requested. As expected, these values are lower than the other results. I also added the results of running cyclictest on dom0 in configuration 3. In this configuration, dom0 was running the credit schedule in a separate CPU-Pool from the guest. On another note, I attempted to get the same measurements for a linux kernel with the Real Time Patch applied. Here are the results: ------------------- Configuration 0 - Bare Metal Kernel Ubuntu 12.04.1 - Linux 3.2.24-rt38 ------------------- Configuration 1 - Only Domain-0 Xen: 4.4-rc2 - Arinc653 Scheduler Domain-0: Ubuntu 12.04.1 - Linux 3.2.24-rt38 xl list -n: Name ID Mem VCPUs State Time(s) NODE Affinity Domain-0 0 1535 1 r----- 30.9 all xl vcpu-list: Name ID VCPU CPU State Time(s) CPU Affinity Domain-0 0 0 0 r-- 35.5 all ------------------- Configuration 2 - Domain-0 and Unscheduled guest Xen: 4.4-rc2 - Arinc653 Scheduler Domain-0: Ubuntu 12.04.1 - Linux 3.2.24-rt38 dom1: Ubuntu 12.04.1 - Linux 3.2.24-rt38 xl list -n: Name ID Mem VCPUs State Time(s) NODE Affinity Domain-0 0 1535 1 r----- 39.7 all dom1 1 512 1 ------ 0.0 all xl vcpu-list: Name ID VCPU CPU State Time(s) CPU Affinity Domain-0 0 0 0 r-- 40.5 all dom1 1 0 0 --- 0.0 all ------------------- Command used: cyclictest -t1 -p 1 -i 30000 -l 500 -q Results: +--------+--------+-----------+-----+-------+-----+ | Config | Domain | Scheduler | Latency (us) | +--------+--------+-----------+-----+-------+-----+ | | | | Min | Max | Avg | +--------+--------+-----------+-----+-------+-----+ | 0 | NA | CFS | 3 | 8 | 5 | | 1 | 0 | Arinc653 | 20 | 160 | 68 | | 2 | 0 | Arinc653 | 18 | 150 | 66 | +--------+--------+-----------+-----+-------+-----+ I couldn't seem to boot into the guest using the kernel with the Real Time Patch applied, which is why I didn't replicate configuration 3. -- --- Robbie VanVossen DornerWorks, Ltd. Embedded Systems Engineering ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-02-13 21:09 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-20 14:10 Delays on usleep calls Pavlo Suikov 2014-01-20 15:05 ` Dario Faggioli 2014-01-20 16:05 ` Pavlo Suikov 2014-01-20 17:31 ` Pavlo Suikov 2014-01-21 10:56 ` Dario Faggioli 2014-01-21 11:46 ` Dario Faggioli 2014-01-21 15:53 ` Pavlo Suikov 2014-01-21 17:56 ` Dario Faggioli 2014-01-23 19:09 ` Pavlo Suikov 2014-01-24 17:08 ` Dario Faggioli 2014-02-05 21:30 ` Robbie VanVossen 2014-02-07 9:22 ` Dario Faggioli 2014-02-13 21:09 ` Robbie VanVossen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).