* [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues @ 2016-05-15 4:11 Tony S 2016-05-16 11:30 ` Dario Faggioli 2016-05-17 9:27 ` George Dunlap 0 siblings, 2 replies; 5+ messages in thread From: Tony S @ 2016-05-15 4:11 UTC (permalink / raw) To: xen-devel Hi all, When I was running latency-sensitive applications in VMs on Xen, I found some bugs in the credit scheduler which will cause long tail latency in I/O-intensive VMs. (1) Problem description ------------Description------------ My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux 3.18.21), Dom U(Linux 3.18.21). Environment setup: We created two 1-vCPU, 4GB-memory VMs and pinned them onto one physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or simply a loop(denoted as CPU-VM). A client on another physical machine sent UDP requests to the I/O-VM. Here are my tail latency results (micro-second): Case Avg 90% 99% 99.9% 99.99% #1 108 & 114 & 128 & 129 & 130 #2 7811 & 13892 & 14874 & 15315 & 16383 #3 943 & 131 & 21755 & 26453 & 26553 #4 116 & 96 & 105 & 8217 & 13472 #5 116 & 117 & 129 & 131 & 132 Bug 1, 2, and 3 will be discussed below. Case #1: I/O-VM was processing Sockperf requests from clients; CPU-VM was idling (no processes running). Case #2: I/O-VM was processing Sockperf requests from clients; CPU-VM was running a compute-bound task. Hypervisor is the native Xen 4.5.0 Case #3: I/O-VM was processing Sockperf requests from clients; CPU-VM was running a compute-bound task. Hypervisor is the native Xen 4.5.0 with bug 1 fixed Case #4: I/O-VM was processing Sockperf requests from clients; CPU-VM was running a compute-bound task. Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed Case #5: I/O-VM was processing Sockperf requests from clients; CPU-VM was running a compute-bound task. Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed --------------------------------------- (2) Problem analysis ------------Analysis---------------- [Bug1]: The VCPU that ran CPU-intensive workload could be mistakenly boosted due to CPU affinity. http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02853.html We have already discussed this bug and a potential patch in the above link. Although the discussed patch improved the tail latency, i.e., reducing the 90th percentile latency, the long tail latency is till not bounded. Next, we discussed two new bugs that inflict latency hike at the very far end of the tail. [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops earning credits and is removed from the active CPU list(in __csched_vcpu_acct_stop_locked) if its credit is larger than the upper bound. Because the domain has only one VCPU and the VM will also be removed from the active domain list. Every 10ms, csched_tick() --> csched_vcpu_acct() --> __csched_vcpu_acct_start() will be executed and tries to put inactive VCPUs back to the active list. However, __csched_vcpu_acct_start() will only put the current VCPU back to the active list. If an I/O-bound VCPU is not the current VCPU at the csched_tick(), it will not be put back to the active VCPU list. If so, the I/O-bound VCPU will likely miss the next credit refill in csched_acct() and can easily enter the OVER state. As such, the I/O-bound VM will be unable to be boosted and have very long latency. It takes at least one time slice (e.g., 30ms) before the I/O VM is activated and starts to receive credits. [Possible Solution] Try to activate any inactive VCPUs back to active before next credit refill, instead of just the current VCPU. [Bug 3]: The BOOST priority might be changed to UNDER before the boosted VCPU preempts the current running VCPU. If so, VCPU boosting can not take effect. If a VCPU is in UNDER state and wakes up from sleep, it will be boosted in csched_vcpu_wake(). However, the boosting is successful only when __runq_tickle() preempts the current VCPU. It is possible that csched_acct() can run between csched_vcpu_wake() and __runq_tickle(), which will sometimes change the BOOST state back to UNDER if credit >0. If so, __runq_tickle() can fail as VCPUs in UNDER cannot preempt another UNDER VCPU. This also contributes to the far end of the long tail latency. [Possible Solution] 1. add a lock to prevent csched_acct() from interleaving with csched_vcpu_wake(); 2. separate the BOOST state from UNDER and OVER states. --------------------------------------- Please confirm these bugs. Thanks. -- Tony. S Ph. D student of University of Colorado, Colorado Springs _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues 2016-05-15 4:11 [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues Tony S @ 2016-05-16 11:30 ` Dario Faggioli 2016-05-16 18:22 ` Tony S 2016-05-17 9:27 ` George Dunlap 1 sibling, 1 reply; 5+ messages in thread From: Dario Faggioli @ 2016-05-16 11:30 UTC (permalink / raw) To: Tony S, xen-devel; +Cc: George Dunlap [-- Attachment #1.1: Type: text/plain, Size: 6168 bytes --] [Adding George, and avoiding trimming, for his benefit] On Sat, 2016-05-14 at 22:11 -0600, Tony S wrote: > Hi all, > Hi Tony, > When I was running latency-sensitive applications in VMs on Xen, I > found some bugs in the credit scheduler which will cause long tail > latency in I/O-intensive VMs. > Ok, first of all, thanks for looking into and reporting this. This is certainly something we need to think about... For now, just a couple of questions. > (1) Problem description > > ------------Description------------ > My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux > 3.18.21), Dom U(Linux 3.18.21). > > Environment setup: > We created two 1-vCPU, 4GB-memory VMs and pinned them onto one > physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server > program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or > simply a loop(denoted as CPU-VM). A client on another physical > machine > sent UDP requests to the I/O-VM. > So, just to be sure I've understood, you have 2 VMs, each with 1 vCPU, *both* pinned on the *same* pCPU, is this the case? > Here are my tail latency results (micro-second): > Case Avg 90% 99% 99.9% 99.99% > #1 108 & 114 & 128 & 129 & 130 > #2 7811 & 13892 & 14874 & 15315 & 16383 > #3 943 & 131 & 21755 & 26453 & 26553 > #4 116 & 96 & 105 & 8217 & 13472 > #5 116 & 117 & 129 & 131 & 132 > > Bug 1, 2, and 3 will be discussed below. > > Case #1: > I/O-VM was processing Sockperf requests from clients; CPU-VM was > idling (no processes running). > > Case #2: > I/O-VM was processing Sockperf requests from clients; CPU-VM was > running a compute-bound task. > Hypervisor is the native Xen 4.5.0 > > Case #3: > I/O-VM was processing Sockperf requests from clients; CPU-VM was > running a compute-bound task. > Hypervisor is the native Xen 4.5.0 with bug 1 fixed > > Case #4: > I/O-VM was processing Sockperf requests from clients; CPU-VM was > running a compute-bound task. > Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed > > Case #5: > I/O-VM was processing Sockperf requests from clients; CPU-VM was > running a compute-bound task. > Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed > > --------------------------------------- > > > (2) Problem analysis > > ------------Analysis---------------- > > [Bug1]: The VCPU that ran CPU-intensive workload could be mistakenly > boosted due to CPU affinity. > > http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02853. > html > > We have already discussed this bug and a potential patch in the above > link. Although the discussed patch improved the tail latency, i.e., > reducing the 90th percentile latency, the long tail latency is till > not bounded. Next, we discussed two new bugs that inflict latency > hike > at the very far end of the tail. > Right, and there is a fix upstream for this. It's not the patch you proposed in the thread linked above, but it should have had the same effect. Can you perhaps try something more recent thatn 4.5 (4.7-rc would be great) and confirm that the number still look similar? About this below here, I'll read carefully and think about it. Thanks again. > [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops > earning > credits and is removed from the active CPU list(in > __csched_vcpu_acct_stop_locked) if its credit is larger than the > upper > bound. Because the domain has only one VCPU and the VM will also be > removed from the active domain list. > > Every 10ms, csched_tick() --> csched_vcpu_acct() --> > __csched_vcpu_acct_start() will be executed and tries to put inactive > VCPUs back to the active list. However, __csched_vcpu_acct_start() > will only put the current VCPU back to the active list. If an > I/O-bound VCPU is not the current VCPU at the csched_tick(), it will > not be put back to the active VCPU list. If so, the I/O-bound VCPU > will likely miss the next credit refill in csched_acct() and can > easily enter the OVER state. As such, the I/O-bound VM will be unable > to be boosted and have very long latency. It takes at least one time > slice (e.g., 30ms) before the I/O VM is activated and starts to > receive credits. > > [Possible Solution] Try to activate any inactive VCPUs back to active > before next credit refill, instead of just the current VCPU. > > > > [Bug 3]: The BOOST priority might be changed to UNDER before the > boosted VCPU preempts the current running VCPU. If so, VCPU boosting > can not take effect. > > If a VCPU is in UNDER state and wakes up from sleep, it will be > boosted in csched_vcpu_wake(). However, the boosting is successful > only when __runq_tickle() preempts the current VCPU. It is possible > that csched_acct() can run between csched_vcpu_wake() and > __runq_tickle(), which will sometimes change the BOOST state back to > UNDER if credit >0. If so, __runq_tickle() can fail as VCPUs in UNDER > cannot preempt another UNDER VCPU. This also contributes to the far > end of the long tail latency. > > [Possible Solution] > 1. add a lock to prevent csched_acct() from interleaving with > csched_vcpu_wake(); > 2. separate the BOOST state from UNDER and OVER states. > --------------------------------------- > > > Please confirm these bugs. > Thanks. > > -- > Tony. S > Ph. D student of University of Colorado, Colorado Springs > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues 2016-05-16 11:30 ` Dario Faggioli @ 2016-05-16 18:22 ` Tony S 0 siblings, 0 replies; 5+ messages in thread From: Tony S @ 2016-05-16 18:22 UTC (permalink / raw) To: Dario Faggioli, xen-devel; +Cc: George Dunlap On Mon, May 16, 2016 at 5:30 AM, Dario Faggioli <dario.faggioli@citrix.com> wrote: > [Adding George, and avoiding trimming, for his benefit] > > On Sat, 2016-05-14 at 22:11 -0600, Tony S wrote: >> Hi all, >> > Hi Tony, > >> When I was running latency-sensitive applications in VMs on Xen, I >> found some bugs in the credit scheduler which will cause long tail >> latency in I/O-intensive VMs. >> > Ok, first of all, thanks for looking into and reporting this. > > This is certainly something we need to think about... For now, just a > couple of questions. Hi Dario, Thank you for your reply. :-) > >> (1) Problem description >> >> ------------Description------------ >> My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux >> 3.18.21), Dom U(Linux 3.18.21). >> >> Environment setup: >> We created two 1-vCPU, 4GB-memory VMs and pinned them onto one >> physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server >> program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or >> simply a loop(denoted as CPU-VM). A client on another physical >> machine >> sent UDP requests to the I/O-VM. >> > So, just to be sure I've understood, you have 2 VMs, each with 1 vCPU, > *both* pinned on the *same* pCPU, is this the case? > Yes. >> Here are my tail latency results (micro-second): >> Case Avg 90% 99% 99.9% 99.99% >> #1 108 & 114 & 128 & 129 & 130 >> #2 7811 & 13892 & 14874 & 15315 & 16383 >> #3 943 & 131 & 21755 & 26453 & 26553 >> #4 116 & 96 & 105 & 8217 & 13472 >> #5 116 & 117 & 129 & 131 & 132 >> >> Bug 1, 2, and 3 will be discussed below. >> >> Case #1: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> idling (no processes running). >> >> Case #2: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> running a compute-bound task. >> Hypervisor is the native Xen 4.5.0 >> >> Case #3: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> running a compute-bound task. >> Hypervisor is the native Xen 4.5.0 with bug 1 fixed >> >> Case #4: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> running a compute-bound task. >> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed >> >> Case #5: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> running a compute-bound task. >> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed >> >> --------------------------------------- >> >> >> (2) Problem analysis >> >> ------------Analysis---------------- >> >> [Bug1]: The VCPU that ran CPU-intensive workload could be mistakenly >> boosted due to CPU affinity. >> >> http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02853. >> html >> >> We have already discussed this bug and a potential patch in the above >> link. Although the discussed patch improved the tail latency, i.e., >> reducing the 90th percentile latency, the long tail latency is till >> not bounded. Next, we discussed two new bugs that inflict latency >> hike >> at the very far end of the tail. >> > Right, and there is a fix upstream for this. It's not the patch you > proposed in the thread linked above, but it should have had the same > effect. > > Can you perhaps try something more recent thatn 4.5 (4.7-rc would be > great) and confirm that the number still look similar? I have tried the latest stable version Xen 4.6 today. Here is my results: Case Avg 90% 99% 99.9% 99.99% #1 91 & 93 & 101 & 105 & 110 #2 22506 & 43011 & 231946 & 259501 & 265561 #3 917 & 95 & 25257 & 30048 & 30756 #4 110 & 95 & 102 & 12448 & 13255 #5 114 & 118 & 130 & 134 & 136 It seems that case#2 is much worse. The other cases are similar. My raw latency data is pasted below. For xen 4.7-rc, I have some installment issues on my machine, therefore I have not tried that. Raw data is as follows. Hope this could help you understand the issues better. :-) # case 1: sockperf: ====> avg-lat= 91.688 (std-dev=2.950) sockperf: ---> <MAX> observation = 110.647 sockperf: ---> percentile 99.99 = 110.647 sockperf: ---> percentile 99.90 = 105.242 sockperf: ---> percentile 99.50 = 101.531 sockperf: ---> percentile 99.00 = 101.066 sockperf: ---> percentile 95.00 = 97.016 sockperf: ---> percentile 90.00 = 93.294 sockperf: ---> percentile 75.00 = 92.157 sockperf: ---> percentile 50.00 = 91.437 sockperf: ---> percentile 25.00 = 90.786 sockperf: ---> <MIN> observation = 73.071 # case 2: sockperf: ====> avg-lat=90019.931 (std-dev=136620.722) sockperf: ---> <MAX> observation = 637712.152 sockperf: ---> percentile 99.99 = 637712.152 sockperf: ---> percentile 99.90 = 632901.547 sockperf: ---> percentile 99.50 = 615972.778 sockperf: ---> percentile 99.00 = 599698.318 sockperf: ---> percentile 95.00 = 428857.020 sockperf: ---> percentile 90.00 = 259316.760 sockperf: ---> percentile 75.00 = 114029.044 sockperf: ---> percentile 50.00 = 24629.429 sockperf: ---> percentile 25.00 = 10368.731 sockperf: ---> <MIN> observation = 81.046 #case 3: sockperf: ====> avg-lat=917.394 (std-dev=3943.142) sockperf: ---> <MAX> observation = 30756.289 sockperf: ---> percentile 99.99 = 30756.289 sockperf: ---> percentile 99.90 = 30048.372 sockperf: ---> percentile 99.50 = 25962.687 sockperf: ---> percentile 99.00 = 25257.746 sockperf: ---> percentile 95.00 = 5615.028 sockperf: ---> percentile 90.00 = 95.726 sockperf: ---> percentile 75.00 = 92.916 sockperf: ---> percentile 50.00 = 90.387 sockperf: ---> percentile 25.00 = 89.162 sockperf: ---> <MIN> observation = 67.762 #case 4: sockperf: ====> avg-lat=110.159 (std-dev=555.153) sockperf: ---> <MAX> observation = 13255.732 sockperf: ---> percentile 99.99 = 13255.732 sockperf: ---> percentile 99.90 = 12448.629 sockperf: ---> percentile 99.50 = 104.799 sockperf: ---> percentile 99.00 = 101.954 sockperf: ---> percentile 95.00 = 97.295 sockperf: ---> percentile 90.00 = 95.995 sockperf: ---> percentile 75.00 = 91.866 sockperf: ---> percentile 50.00 = 88.803 sockperf: ---> percentile 25.00 = 71.088 sockperf: ---> <MIN> observation = 65.826 #case 5: sockperf: ====> avg-lat=114.984 (std-dev=3.782) sockperf: ---> <MAX> observation = 136.748 sockperf: ---> percentile 99.99 = 136.748 sockperf: ---> percentile 99.90 = 134.192 sockperf: ---> percentile 99.50 = 131.467 sockperf: ---> percentile 99.00 = 130.200 sockperf: ---> percentile 95.00 = 121.575 sockperf: ---> percentile 90.00 = 118.518 sockperf: ---> percentile 75.00 = 116.343 sockperf: ---> percentile 50.00 = 114.356 sockperf: ---> percentile 25.00 = 112.479 sockperf: ---> <MIN> observation = 94.932 > > About this below here, I'll read carefully and think about it. Thanks > again. Thank you, Dario. For bug 2 and bug 3, although they will not influence the throughput, latency, especially long tail latency is a big issue due to them. > >> [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops >> earning >> credits and is removed from the active CPU list(in >> __csched_vcpu_acct_stop_locked) if its credit is larger than the >> upper >> bound. Because the domain has only one VCPU and the VM will also be >> removed from the active domain list. >> >> Every 10ms, csched_tick() --> csched_vcpu_acct() --> >> __csched_vcpu_acct_start() will be executed and tries to put inactive >> VCPUs back to the active list. However, __csched_vcpu_acct_start() >> will only put the current VCPU back to the active list. If an >> I/O-bound VCPU is not the current VCPU at the csched_tick(), it will >> not be put back to the active VCPU list. If so, the I/O-bound VCPU >> will likely miss the next credit refill in csched_acct() and can >> easily enter the OVER state. As such, the I/O-bound VM will be unable >> to be boosted and have very long latency. It takes at least one time >> slice (e.g., 30ms) before the I/O VM is activated and starts to >> receive credits. >> >> [Possible Solution] Try to activate any inactive VCPUs back to active >> before next credit refill, instead of just the current VCPU. >> >> >> >> [Bug 3]: The BOOST priority might be changed to UNDER before the >> boosted VCPU preempts the current running VCPU. If so, VCPU boosting >> can not take effect. >> >> If a VCPU is in UNDER state and wakes up from sleep, it will be >> boosted in csched_vcpu_wake(). However, the boosting is successful >> only when __runq_tickle() preempts the current VCPU. It is possible >> that csched_acct() can run between csched_vcpu_wake() and >> __runq_tickle(), which will sometimes change the BOOST state back to >> UNDER if credit >0. If so, __runq_tickle() can fail as VCPUs in UNDER >> cannot preempt another UNDER VCPU. This also contributes to the far >> end of the long tail latency. >> >> [Possible Solution] >> 1. add a lock to prevent csched_acct() from interleaving with >> csched_vcpu_wake(); >> 2. separate the BOOST state from UNDER and OVER states. >> --------------------------------------- >> >> >> Please confirm these bugs. >> Thanks. >> >> -- >> Tony. S >> Ph. D student of University of Colorado, Colorado Springs >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) > -- Tony S. Ph. D student of University of Colorado, Colorado Springs _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues 2016-05-15 4:11 [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues Tony S 2016-05-16 11:30 ` Dario Faggioli @ 2016-05-17 9:27 ` George Dunlap 2016-05-17 16:11 ` Tony S 1 sibling, 1 reply; 5+ messages in thread From: George Dunlap @ 2016-05-17 9:27 UTC (permalink / raw) To: Tony S; +Cc: xen-devel@lists.xen.org On Sun, May 15, 2016 at 5:11 AM, Tony S <suokunstar@gmail.com> wrote: > Hi all, > > When I was running latency-sensitive applications in VMs on Xen, I > found some bugs in the credit scheduler which will cause long tail > latency in I/O-intensive VMs. > > > (1) Problem description > > ------------Description------------ > My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux > 3.18.21), Dom U(Linux 3.18.21). > > Environment setup: > We created two 1-vCPU, 4GB-memory VMs and pinned them onto one > physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server > program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or > simply a loop(denoted as CPU-VM). A client on another physical machine > sent UDP requests to the I/O-VM. > > Here are my tail latency results (micro-second): > Case Avg 90% 99% 99.9% 99.99% > #1 108 & 114 & 128 & 129 & 130 > #2 7811 & 13892 & 14874 & 15315 & 16383 > #3 943 & 131 & 21755 & 26453 & 26553 > #4 116 & 96 & 105 & 8217 & 13472 > #5 116 & 117 & 129 & 131 & 132 > > Bug 1, 2, and 3 will be discussed below. > > Case #1: > I/O-VM was processing Sockperf requests from clients; CPU-VM was > idling (no processes running). > > Case #2: > I/O-VM was processing Sockperf requests from clients; CPU-VM was > running a compute-bound task. > Hypervisor is the native Xen 4.5.0 > > Case #3: > I/O-VM was processing Sockperf requests from clients; CPU-VM was > running a compute-bound task. > Hypervisor is the native Xen 4.5.0 with bug 1 fixed > > Case #4: > I/O-VM was processing Sockperf requests from clients; CPU-VM was > running a compute-bound task. > Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed > > Case #5: > I/O-VM was processing Sockperf requests from clients; CPU-VM was > running a compute-bound task. > Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed > > --------------------------------------- > > > (2) Problem analysis Hey Tony, Thanks for looking at this. These issues in the credit1 algorithm are essentially exactly the reason that I started work on the credit2 scheduler several years ago. We meant credit2 to have replaced credit1 by now, but we ran out of time to test it properly; we're in the process of doing that right now, and are hoping it will be the default scheduler for the 4.8 release. So if I could make two suggestions that would help your effort be more helpful to us: 1. Use cpupools for testing rather than pinning. A lot of the algorithms are designed with the assumption that they have all the cpus to run on, and the credit allocation / priority algorithms fail to work properly when they are only pinned. Cpupools was specifically designed to allow the scheduler algorithms to work as designed with a smaller number of cpus than the system had. 2. Test credit2. :-) One comment about your analysis here... > [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops earning > credits and is removed from the active CPU list(in > __csched_vcpu_acct_stop_locked) if its credit is larger than the upper > bound. Because the domain has only one VCPU and the VM will also be > removed from the active domain list. > > Every 10ms, csched_tick() --> csched_vcpu_acct() --> > __csched_vcpu_acct_start() will be executed and tries to put inactive > VCPUs back to the active list. However, __csched_vcpu_acct_start() > will only put the current VCPU back to the active list. If an > I/O-bound VCPU is not the current VCPU at the csched_tick(), it will > not be put back to the active VCPU list. If so, the I/O-bound VCPU > will likely miss the next credit refill in csched_acct() and can > easily enter the OVER state. As such, the I/O-bound VM will be unable > to be boosted and have very long latency. It takes at least one time > slice (e.g., 30ms) before the I/O VM is activated and starts to > receive credits. > > [Possible Solution] Try to activate any inactive VCPUs back to active > before next credit refill, instead of just the current VCPU. When we stop accounting, we divide the credits in half, so that when it starts out, it should have a reasonable amount of credit (15ms worth). Is this not taking effect for some reason? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues 2016-05-17 9:27 ` George Dunlap @ 2016-05-17 16:11 ` Tony S 0 siblings, 0 replies; 5+ messages in thread From: Tony S @ 2016-05-17 16:11 UTC (permalink / raw) To: George Dunlap, xen-devel@lists.xen.org; +Cc: Dario Faggioli On Tue, May 17, 2016 at 3:27 AM, George Dunlap <dunlapg@umich.edu> wrote: > On Sun, May 15, 2016 at 5:11 AM, Tony S <suokunstar@gmail.com> wrote: >> Hi all, >> >> When I was running latency-sensitive applications in VMs on Xen, I >> found some bugs in the credit scheduler which will cause long tail >> latency in I/O-intensive VMs. >> >> >> (1) Problem description >> >> ------------Description------------ >> My test environment is as follows: Hypervisor(Xen 4.5.0), Dom 0(Linux >> 3.18.21), Dom U(Linux 3.18.21). >> >> Environment setup: >> We created two 1-vCPU, 4GB-memory VMs and pinned them onto one >> physical CPU core. One VM(denoted as I/O-VM) ran Sockperf server >> program; the other VM ran a compute-bound task, e.g., SPECCPU 2006 or >> simply a loop(denoted as CPU-VM). A client on another physical machine >> sent UDP requests to the I/O-VM. >> >> Here are my tail latency results (micro-second): >> Case Avg 90% 99% 99.9% 99.99% >> #1 108 & 114 & 128 & 129 & 130 >> #2 7811 & 13892 & 14874 & 15315 & 16383 >> #3 943 & 131 & 21755 & 26453 & 26553 >> #4 116 & 96 & 105 & 8217 & 13472 >> #5 116 & 117 & 129 & 131 & 132 >> >> Bug 1, 2, and 3 will be discussed below. >> >> Case #1: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> idling (no processes running). >> >> Case #2: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> running a compute-bound task. >> Hypervisor is the native Xen 4.5.0 >> >> Case #3: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> running a compute-bound task. >> Hypervisor is the native Xen 4.5.0 with bug 1 fixed >> >> Case #4: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> running a compute-bound task. >> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 fixed >> >> Case #5: >> I/O-VM was processing Sockperf requests from clients; CPU-VM was >> running a compute-bound task. >> Hypervisor is the native Xen 4.5.0 with bug 1 & 2 & 3 fixed >> >> --------------------------------------- >> >> >> (2) Problem analysis > > Hey Tony, > > Thanks for looking at this. These issues in the credit1 algorithm are > essentially exactly the reason that I started work on the credit2 > scheduler several years ago. We meant credit2 to have replaced > credit1 by now, but we ran out of time to test it properly; we're in > the process of doing that right now, and are hoping it will be the > default scheduler for the 4.8 release. > > So if I could make two suggestions that would help your effort be more > helpful to us: > > 1. Use cpupools for testing rather than pinning. A lot of the > algorithms are designed with the assumption that they have all the > cpus to run on, and the credit allocation / priority algorithms fail > to work properly when they are only pinned. Cpupools was specifically > designed to allow the scheduler algorithms to work as designed with a > smaller number of cpus than the system had. > > 2. Test credit2. :-) > Hi George, Thank you for reply. I will try cpupools and credit2 later. :-) > One comment about your analysis here... > >> [Bug2]: In csched_acct() (by default every 30ms), a VCPU stops earning >> credits and is removed from the active CPU list(in >> __csched_vcpu_acct_stop_locked) if its credit is larger than the upper >> bound. Because the domain has only one VCPU and the VM will also be >> removed from the active domain list. >> >> Every 10ms, csched_tick() --> csched_vcpu_acct() --> >> __csched_vcpu_acct_start() will be executed and tries to put inactive >> VCPUs back to the active list. However, __csched_vcpu_acct_start() >> will only put the current VCPU back to the active list. If an >> I/O-bound VCPU is not the current VCPU at the csched_tick(), it will >> not be put back to the active VCPU list. If so, the I/O-bound VCPU >> will likely miss the next credit refill in csched_acct() and can >> easily enter the OVER state. As such, the I/O-bound VM will be unable >> to be boosted and have very long latency. It takes at least one time >> slice (e.g., 30ms) before the I/O VM is activated and starts to >> receive credits. >> >> [Possible Solution] Try to activate any inactive VCPUs back to active >> before next credit refill, instead of just the current VCPU. > > When we stop accounting, we divide the credits in half, so that when > it starts out, it should have a reasonable amount of credit (15ms > worth). Is this not taking effect for some reason? > Actually, for bug 2, dividing the credits in half to have a reasonable credit is not the issue. The problem here is that the VCPU will be removed from active VCPU list(in __csched_vcpu_acct_stop_locked) and will not be put back to active list in time sometimes(as I explained in the first thread). If the VCPU is not active, next time the csched_acct will not allocate new credits to this VCPU. If many rounds happened, the credit of this VCPU will be a small negative number(e.g., -1000) and won't be scheduled. The I/O-intensive applications on it, especially latency-intensive workloads, will suffer long tail latency issue. > -George -- Tony _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-05-17 16:11 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-05-15 4:11 [BUG] Bugs existing Xen's credit scheduler cause long tail latency issues Tony S 2016-05-16 11:30 ` Dario Faggioli 2016-05-16 18:22 ` Tony S 2016-05-17 9:27 ` George Dunlap 2016-05-17 16:11 ` Tony S
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).