From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Ofsthun Subject: Re: Does Xen detect busy-spinning VCPUs? Date: Fri, 19 Sep 2008 13:00:52 -0400 Message-ID: <48D3DAC4.6030106@virtualiron.com> References: <9548184c-cd3e-4dac-afa6-13480c178e79@default> <200809191747.02817.kaiser@informatik.fh-wiesbaden.de> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <200809191747.02817.kaiser@informatik.fh-wiesbaden.de> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Robert Kaiser Cc: Daniel Magenheimer , xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Robert Kaiser wrote: > Daniel, >=20 > thanks for your response! >=20 > Am Freitag 19 September 2008 16:36:13 schrieb Daniel Magenheimer: >> Is your mini-OS pinned and you're sure dom0 or other >> domains are not getting a piece of the pcpu? If so... >=20 > Well as I said: It's my own scheduler that decides. I can see that it g= ets=20 > invoked a few times, but I checked that it always returns the VCPU that= is=20 > running the spinning loop. Thus, if everything outside my scheduler pla= ys by=20 > (what I think are) the rules, that VCPU should be the only one to get a= ccess=20 > to the PCPU. (Except for interrupt-level activities, of course). So, ju= st=20 > _assuming_ that interrupt processing does not eat up those tens of=20 > milliseconds, where else can they possibly go? >=20 > Any hints as to how I could proceed to pinpoint this problem? Try your minios test domain w/o your own changes to the Xen scheduler. W= hen you run the test, use a uniprocessor dom0 bound to cpu 0. Bind your = minios test domain to cpu 1. This will verify your test domain code inde= pendent from you Xen scheduler changes. If your test domain is still see= ing large time jumps, verify that the idle vcpu for cpu 1 is not getting = any cpu time. If it is, your test domain is doing something that is caus= ing it to block. Steve So far I have=20 > debugged my code by running the entire system on Qemu, using its built-= in=20 > debug stub. However, anything time-related behaves completely different= on=20 > Qemu than on real hardware, so I can't use that setup any more. Present= ly,=20 > I'm trying to get Xen's built-in GDB stub to work, I wonder if that wil= l be=20 > any better than Qemu. AFAIU, the stub would have to preserve TSC regist= er=20 > contents across breakpoints, otherwise the time coordinate perceived by= the=20 > system will jump erratically. Not sure if it does that really, so this = may=20 > turn out to be another dead end -- oh well.. >=20 >> I've seen anecdotal evidence of long pauses that led me >> to wonder about interrupt latency here: >> >> http://lists.xensource.com/archives/html/xen-devel/2008-08/msg00232.ht= ml >> >> I don't recall the situation or the length of the pause >> but perhaps you are seeing something similar. Unfortunately, >=20 > I am seeing situations where two subsequent calls to NOW() in the Mini-= OS=20 > context deliver time coordinates that differ by 95(!) milliseconds. If = this=20 > were due to interrupt latencies, surely that would have been noticed by= =20 > someone? >=20 > Cheers >=20 > Rob >=20 >=20 >> I never pursued the answer to the interrupt latency question. >> >>> -----Original Message----- >>> From: Robert Kaiser [mailto:kaiser@informatik.fh-wiesbaden.de] >>> Sent: Friday, September 19, 2008 6:00 AM >>> To: xen-devel@lists.xensource.com >>> Subject: [Xen-devel] Does Xen detect busy-spinning VCPUs? >>> >>> >>> Hi all, >>> >>> I'm currently developing/testing a new scheduler for Xen and >>> I am seeing some >>> very strange behaviour which I can't seem to pinpoint: For >>> benchmarking >>> purposes, I am running a task inside Mini-OS in a tight, >>> busy-spinning loop >>> for some time. The loop repeatedly polls NOW() until it >>> exceeds a certain >>> time limit. What I am observing is that NOW() seems to "jump" >>> sometimes: two >>> subsequent reads return values which differ by tens of >>> milliseconds! I notice >>> that my scheduler gets invoked a couple of times, but it does >>> *not* switch to >>> another VCPU and I doubt that the scheduler invocations alone >>> take that long. >>> So the loop should indeed be contiuously spinning with sporadic >>> interruptions in the range of a few microseconds, but not tens of >>> milliseconds. Yet, this is not what I am seeing. I wonder >>> where the (P)CPU >>> goes during those time intervals and so this possibly weird >>> idea came up that >>> Xen might use some trickery trying to detect and pause >>> busy-spinning VCPUs. >>> Is there anything like that in Xen (BTW: This is xen-3.2.1) , >>> and, if there >>> is, can it be disabled for a given domain? >>> >>> (Sorry if this is a silly question. Since my code is >>> experimental and not well >>> tested yet, there is of course the possibility that I made >>> some stupid >>> mistake. However, I've been staring at code, debug logs, etc. >>> for several >>> days now without much success and I am slowly getting >>> desperate. If Xen >>> really does pause spinning VCPUs it would explain everything.) >>> >>> Thanks for any help >>> >>> Rob >>> >>> -- >>> Robert Kaiser >>> http://wwwvs.informatik.fh-wiesbaden.de >>> Labor f=FCr Verteilte Systeme >>> kaiser@informatik.fh-wiesbaden.de >>> FH Wiesbaden - University of Applied Sciences tel: >>> (+49)611-9495-1294 >>> Kurt-Schumacher-Ring 18, 65197 Wiesbaden, Germany fax: >>> (+49)611-9495-1289 >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >=20 >=20 >=20