From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Schmitt Subject: Re: Soft lockups and cpu frequency scaling Date: Mon, 04 Jan 2010 15:01:29 +0100 Message-ID: <4B41F4B9.4090005@scsy.de> References: <4B30C6C1.9080508@scsy.de> <20091224155210.GB13003@amt.cnet> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig4A963E8EEF4D667081360C97" Cc: kvm@vger.kernel.org To: Marcelo Tosatti Return-path: Received: from vmx1.f00.net ([80.242.134.136]:38554 "EHLO vmx1.f00.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753418Ab0ADOBt (ORCPT ); Mon, 4 Jan 2010 09:01:49 -0500 In-Reply-To: <20091224155210.GB13003@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig4A963E8EEF4D667081360C97 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Ciao Marcelo, sorry for getting back so late. Thanks for your patience. :-) Marcelo Tosatti schrieb: >> I'm running a manually compiled KVM on CentOS 5.4. The KVM installatio= n >> has been carried over from CentOS 5.3, when KVM wasn't distributed wit= h >> the OS. (I tried to migrate to CentOS 5.4 native KVM support, but wasn= 't >> able to get along with RedHat's interpretation of KVM.) >> >> The KVM version used is 88, on Kernel 2.6.18-128.7.1.el5, as KVM doesn= 't >> seem to compile on CentOS' current 2.6.18-164.9.1.el5. >> >> Only on CentOS guests, I see very frequent "soft lockup" messages and >> excessively hanging KVM instances. >=20 > Can you please share some of the soft lockup messages. >=20 > And how exactly are the VMs hanging? They are unresponsive for a few seconds. More "hiccuping" than hanging. It appears to be I/O-related in some way, because it happens most frequently when I do things on the file system. Dmesg is full of these: BUG: soft lockup - CPU#0 stuck for 10s! [kblockd/0:10] Pid: 10, comm: kblockd/0 EIP: 0060:[] CPU: 0 EIP is at ide_outb+0x4/0x5 EFLAGS: 00000202 Not tainted (2.6.18-164.6.1.el5 #1) EAX: 00000001 EBX: c07e2f80 ECX: 00000286 EDX: 0000c000 ESI: 00000011 EDI: 00000000 EBP: c07e3014 DS: 007b ES: 007b CR0: 8005003b CR2: b7f3c000 CR3: 12122000 CR4: 000006d0 [] ide_dma_start+0x22/0x2e [] ide_do_rw_disk+0x3b2/0x4a6 [] ide_do_request+0x533/0x6bf [] freed_request+0x1d/0x37 [] ide_end_request+0xcc/0xd4 [] ide_intr+0x167/0x190 [] handle_IRQ_event+0x45/0x8c [] __do_IRQ+0x84/0xd6 [] __do_IRQ+0x0/0xd6 [] do_IRQ+0x99/0xc3 [] common_interrupt+0x1a/0x20 [] __do_softirq+0x57/0x114 [] do_softirq+0x52/0x9c [] apic_timer_interrupt+0x1f/0x24 [] ide_outb+0x4/0x5 [] ide_dma_start+0x22/0x2e [] ide_do_rw_disk+0x3b2/0x4a6 [] ide_do_request+0x533/0x6bf [] cfq_kick_queue+0x70/0x80 [] run_workqueue+0x78/0xb5 [] cfq_kick_queue+0x0/0x80 [] worker_thread+0xd9/0x10b [] default_wake_function+0x0/0xc [] worker_thread+0x0/0x10b [] kthread+0xc0/0xeb [] kthread+0x0/0xeb [] kernel_thread_helper+0x7/0x10 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> Long-term stability is fine (several months uptime), but disturbed >> by the hangs. The problem already was there on CentOS 5.3 as well. >> With the Debian guests on the same host, I have never had any apparent= >> problems. >=20 > Questions: >=20 > - Is there significant swapping on the host? > - Are you migrating vm's?=20 No migration and no swap activity. The host has plenty of idle RAM: [root@zulu ~]# free -m total used free shared buffers cached= Mem: 7987 7904 82 0 667 5101= -/+ buffers/cache: 2135 5851 Swap: 1983 0 1983 >> A number of google results suggest that I should work with CPU scaling= >> on the CentOS guest systems, but unfortunately, CPU scaling is not >> available in my guests. So, here's my question: How do I enable CPU >> scaling in KVM guests? Or is there any other measure against these sof= t >> lockups that you can recommend? >=20 > What probably was suggested is to disable cpu frequency scaling on the > host. Please provide more details on the host system. Host is a Quadcore Xeon HP DL320 G5 with CentOS 5.4, old Kernel 2.6.18-128.7.1.el5. There are no hints toward CPU scaling in /sys/devices/system/ on the host= : [root@zulu ~]# ls -l /sys/devices/system/cpu/cpu0 total 0 drwxr-xr-x 5 root root 0 Nov 7 13:47 cache -r-------- 1 root root 4096 Jan 4 14:55 crash_notes drwxr-xr-x 2 root root 0 Nov 7 13:48 topology The file "Crash Notes" contains the following number: 22792b400 Thanks for your help, -martin --=20 Martin Schmitt - Schmitt Systemberatung - http://www.scsy.de DE 35415 Pohlheim, Gie=DFener Str. 18 DE 65307 Bad Schwalbach, Am Br=E4unchesberg 9 Linux/UNIX - Internet - E-Mail Infrastructure - Antispam/Antivirus - "What goes up, must come down. Ask any system administrator." - --------------enig4A963E8EEF4D667081360C97 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) iD8DBQFLQfS5CQOWd1YZ430RAmg2AJ9pmIx8B/Sxj5rnU1qIKaMZyvfZzwCfY9gw vQkVJ9NFd7um9avASOp1W/o= =4jGu -----END PGP SIGNATURE----- --------------enig4A963E8EEF4D667081360C97--