From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Kinzler Subject: State of GPLPV tests - 28.11.11 Date: Mon, 28 Nov 2011 14:49:24 +0100 Message-ID: <4ED39164.5040203@hfp.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: James Harper , xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Hello James, I am still running tests 7 days a week on two test systems. Results are quite discouraging though. After experiencing crash after crash I wanted to test if the configuration I called "stable" (Xen 4.0.1, GPLPV 0.11.0.213, dom0 kernel 2.6.32.18-pvops0-ak3) was stable indeed. But even that config crashed when running my torture test. It is stable on our production systems - running other workloads of course. > One thing I thought of... virtualisation gives an interesting > opportunity to exaggerate race conditions. If you have 8 vCPU's in a > DomU but only let one or two physical CPUs service those 8 vCPU's,then > it can give rise to race conditions which could only be rarely seen > (or never seen) in normal operation. It's awful for performance but > if you could try that and see if it gives rise to crashes a bit > more frequently it might help us track down the problem. What exactly is the config you are talking about in terms of Xen/dom0 command line? In terms of domU config files? As always, I monitor your mercurial repo ;-) How would you see the relationship of commits 952+953 to our problem? 952 seems to affect LSO in some way since LsoV1TransmitComplete.TcpPayload is finally wrong (could it be negative since tx_length is smaller than the fixed tx_length?). What about 953? One more thought: As mentioned earlier crashes often occurred after an uptime of 9-10 days and these crashes occurred too consistently to be a "by chance" event. In my torture tests I am NOT USING a Windows NTP service (I use the meinberg NTP daemon on Windows). But on production I do. Can you see any possible impact here? Regards Andreas