From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tomasz Chmielewski Subject: strange guest slowness after some time Date: Sat, 07 Mar 2009 16:47:17 +0100 Message-ID: <49B29705.6000904@wpkg.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit To: "kvm@vger.kernel.org" Return-path: Received: from mx03.syneticon.net ([78.111.66.105]:48766 "EHLO mx03.syneticon.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751362AbZCGPrZ (ORCPT ); Sat, 7 Mar 2009 10:47:25 -0500 Received: from localhost (filter1.syneticon.net [192.168.113.83]) by mx03.syneticon.net (Postfix) with ESMTP id C5A1B35DF3 for ; Sat, 7 Mar 2009 16:47:21 +0100 (CET) Received: from mx03.syneticon.net ([192.168.113.84]) by localhost (mx03.syneticon.net [192.168.113.83]) (amavisd-new, port 10025) with ESMTP id z435w5Kyqcbe for ; Sat, 7 Mar 2009 16:47:16 +0100 (CET) Received: from [192.168.10.145] (koln-4db4012c.pool.einsundeins.de [77.180.1.44]) by mx03.syneticon.net (Postfix) with ESMTPSA for ; Sat, 7 Mar 2009 16:47:18 +0100 (CET) Sender: kvm-owner@vger.kernel.org List-ID: I have a strange slowness which affects some guests after they are running for some time. "Slowness" can happen a few hours after guest start, or, a couple of days after guest start. What do I mean by "slowness"? This is how long it takes to log in via SSH to an unaffected guest - below a second: $ time ssh backupuser@normal_guest exit 0.02user 0.01system 0:00.67elapsed 4%CPU (0avgtext+0avgdata 0maxresident) Now, let's try to log in to the affected guest running on the same host - more than 12 seconds: $ time ssh backupuser@slow_guest exit 0.02user 0.01system 0:12.56elapsed 0%CPU (0avgtext+0avgdata 0maxresident) If I log in via SSH to the affected guest, any key presses lag a second or two. This is actually weird - if I run something IO intensive on the guest, the login is much faster (running CPU-intensive tasks makes no difference): guest# dd if=/dev/vda of=/dev/null $ time ssh backupuser@slow_guest exit 0.02user 0.00system 0:00.70elapsed 2%CPU (0avgtext+0avgdata 0maxresident) Also, running "ping -f " helps a lot and SSH logins are fast. Look at the difference here - 7470ms vs 139183ms (and packet losses): # ping -f -c 10000 normal_guest 10000 packets transmitted, 10000 received, 0% packet loss, time 7470ms rtt min/avg/max/mdev = 0.443/0.709/6.487/0.112 ms, ipg/ewma 0.747/0.716 ms # ping -f -c 10000 slow_guest 10000 packets transmitted, 9934 received, 0% packet loss, time 139183ms rtt min/avg/max/mdev = 0.470/14.337/50.455/5.409 ms, pipe 4, ipg/ewma 13.919/14.788 ms CPU-intensive tasks are as fast as on unaffected guests. Reading from /dev/vda is as fast as on unaffected guests. So the only thing broken seems to be the network. Rebooting the guest does not help - it is still slow. The only thing that helps is stopping the guest and starting it again (i.e., stopping kvm process and starting a new one). Is there an explanation to this phenomenon? Looks like a problem with virtio drivers somewhere, or? The host is running kvm-83. Affected guests are running 2.6.27.14 kernels and use virtio drivers. The problem happens only _sometimes_. Out of 9 guests I have running on this host, I saw this problem only on 3 guests. I never saw this happening on more than one guest at a time. All three have 512 MB memory assigned, other guests have less memory. -- Tomasz Chmielewski http://wpkg.org