From mboxrd@z Thu Jan 1 00:00:00 1970 From: NISHIGUCHI Naoki Subject: Re: [PATCH] Accurate vcpu weighting for credit scheduler Date: Thu, 18 Dec 2008 12:20:28 +0900 Message-ID: <4949C17C.50403@jp.fujitsu.com> References: <20081215065332.65B7948004B@m024.s.css.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20081215065332.65B7948004B@m024.s.css.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Atsushi SAKAI , xen-devel@lists.xensource.com Cc: George Dunlap , Emmanuel Ackaouy List-Id: xen-devel@lists.xenproject.org Hi, Atsushi After my patches applied, I have tested similarly. The CPU% shows following. dom0 25 dom1 25 dom2 50 dom3 100 How do you think about my patches? Regards, Naoki Nishiguchi Atsushi SAKAI wrote: > Hi, George > > Sorry for delaying. > > With this type of changes, > The CPU% shows following. > dom1 26 > dom2 26 > dom3 51 > dom4 96 > > Thanks > Atsushi SAKAI > > "George Dunlap" wrote: > >> OK, I've grueled through an example by hand and think I see what's going on. >> >> So the idea of the credit scheduler is that we have a certain number >> of "credits" per accounting period, and each of these credits >> represents a certain amount of time. The scheduler gives out credits >> according to weight, so theoretically each accounting period, if all >> vcpus are active, each should consume all of its credits. Based on >> that assumption, if a vcpu has run and accumulated more than one full >> accounting period of credits, it's probably idle and we can leave it >> be. >> >> The problem in this situation isnt' so much with rounding errors, as >> with *scheduling granularity*. In the eample given: >> >> d1: weight 128 >> d2: weight 128 >> d3: weight 256 >> d4: weight 512 >> >> If each domain has 2 vcpus, and there are 2 cores, then the credits >> will be divided thus: >> >> d1: 37 credits / vcpu >> d2: 37 credits / vcpu >> d3: 75 credits / vcpu >> d4: 150 credits / vcpu >> >> But since scheduling and accounting only happens every "tick", and >> every "tick" is 100 credits. So each vcpu of d{1,2}, instead of >> consuming 37 credits, consumes 100; same with each vcpu of d3. At >> the end of the first accounting period, d{1,2,3} have gotten to run >> for 100 credits worth of time, but d4 hasn't gotten to run at all. >> >> In short, the fact that we have a 100-credit scheduling granularity >> breaks the assumption that every VM has had a chance to run each >> accounting period when there are really long runqueues. >> >> I can think of a couple of solutions: the simplest one might be to >> sort the runqueue by number of credits -- at least every accounting >> period. In that case, d4 would always get to run every accounting >> period; d{1.2} might not run for a given accounting period, but the >> next time it would have twice the number of credits, &c. >> >> Others might include extending accounting periods when we have long >> runqueues, or doing the credit limit during accounting only if it's >> not on the runqueue (Sakai-san's idea) *combined* with a check when >> the vcpu blocks. That would catch vcpus that are only moderately >> active, but just happen to be on the runqueue for several accounting >> periods in a row. >> >> Sakai-san, would you be willing to try to implement a simple "runqueue >> sort" patch, and see if it also solves your scheduling issue? >> >> -George >> >> On Wed, Dec 10, 2008 at 2:45 AM, Atsushi SAKAI wrote: >>> Hi, Emmanuel >>> >>> 1)rounding error for credit >>> >>> This patch is over rounding error. >>> So I think it does not need to consider this effect. >>> If you think, would you suggest me your patch. >>> It seems changing CSCHED_TICKS_PER_ACCT is not enough. >>> >>> 2)Effect for I/O intensive job. >>> >>> I am not change the code for BOOST priority. >>> I just changes "credit reset" condition. >>> It should be no effect on I/O intensive(but I am not measured it.) >>> If it needs, I will test it. >>> Which test is best for this change? >>> (Simple I/O test is not enough for this case, >>> I think complex domain I/O configuration is needed to prove this patch effect.) >>> >>> 3)vcpu allocation measurement. >>> >>> At first time, I use >>> http://weather.ou.edu/~apw/projects/stress/ >>> stress --cpu xx --timeout xx --verbose >>> then I use simple test.(since 2vcpus on 1domain) >>> yes > /dev/null & >>> yes > /dev/null & >>> Now I test with suggested method, then result is >>> original w/ patch >>> dom1 27 25 >>> dom2 27 25 >>> dom3 53 50 >>> dom4 91 98 >>> >>> >>> Thanks >>> Atsushi SAKAI >>> >>> >>> >>> >>> Emmanuel Ackaouy wrote: >>> >>>> On Dec 9, 2008, at 2:25, George Dunlap wrote: >>>>> On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI >>>>> wrote: >>>>>> You mean it should get rid of "credit reset"? >>>>> Yes, that's exactly what I was thinking. Removing the check for vcpus >>>>> on the runqueue may actually be functionally equivalent to removing >>>>> the check altogether. >>>> Essentially, this code is there as a safeguard against rounding errors >>>> and other oddball cases. In theory, a runnable VCPU should seldom >>>> accumulate more than one time slice's worth of credits. >>>> >>>> The problem with your change is that a VCPU that is not a spinner >>>> but instead runs and sleeps may not be removed from the accounting >>>> list because when it should because it will not always be running when >>>> accounting and the check in question is performed. Potentially this will >>>> do very bad things for VCPUs that are I/O intensive or otherwise yield >>>> or sleep for a short time before consuming a full time slice. >>>> >>>> One thing that may help here is to make the credit calculations less >>>> prone to rounding errors. One thing I had wanted to do while at >>>> XenSource but never got around to was to change the arithmetic >>>> so that instead of 30 credits representing a time slice, we would >>>> make this a much bigger number. >>>> >>>> In this case for example, you would get credit allocations that had >>>> less significant rounding errors if you used 30000 instead of 30 >>>> credits per time slice: >>>> >>>> dom1 vcpu0,1 w128 credit 3750 >>>> dom2 vcpu0,1 w128 credit 3750 >>>> dom3 vcpu0,1 w256 credit 7500 >>>> dom4 vcpu0,1 w512 credit 15000 >>>> >>>> I suspect this would get rid of a large number of cases such as the >>>> one you are reporting, where a runnable VCPU's credit exceeds >>>> one entire time slice. This type of change would improve accuracy >>>> and not screw up credit computation for I/O intensive and other >>>> non spinning domains. >>>> >>>> What do you think? >>>> >>>> Also please confirm that your VCPUs are indeed doing simple >>>> "while(1);" loops. >>>> >>>> Cheers, >>>> Emmanuel. >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel