From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: both qemu-kvm-0.10.6 and kvm-88 crashing under heavy load while using scsi-backed storage Date: Wed, 02 Sep 2009 22:21:10 +0300 Message-ID: <4A9EC5A6.1040307@redhat.com> References: <20090902172342.GA6090@nik-comp.linuxbox.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: KVM list , nikola.ciprich@linuxbox.cz To: Nikola Ciprich Return-path: Received: from mx1.redhat.com ([209.132.183.28]:10922 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752248AbZIBTUz (ORCPT ); Wed, 2 Sep 2009 15:20:55 -0400 In-Reply-To: <20090902172342.GA6090@nik-comp.linuxbox.cz> Sender: kvm-owner@vger.kernel.org List-ID: On 09/02/2009 08:23 PM, Nikola Ciprich wrote: > Hello, > we're having problem one of our kvm guests. According to storage summary mail Christopher > sent few days ago, I switched my disk model to SCSI which should be the safest choice. > But now we've stumbled upon the problem that the whole guest crashes when we run one > specific postgres query which heavily loads it. > scsi is hardly tested, so this isn't surprising. > I can 100% reproduce this problem on both our production nodes (each 8cores, 16GB RAM), > on my testing machine (4cores, 3GB RAM) this causes the HOST to reboot (which is even worse). > We haven't experienced this problem with virtio. > AMD or Intel? Uni or smp guests? The host crash is of course more worrying. Can you capture dmesg? > I tried both qemu-kvm-0.10.6 and kvm-88, the host is running 2.6.30.5, I tried 2.6.29.x and > 2.6.30.5 for guest. > > Guest backtrace follows (it's a bit mangled as it's obtained using netconsole) > Sep 2 19:01:20 sql2 [ 1564.795629] BUG: unable to handle kernel > Sep 2 19:01:20 sql2 NULL pointer dereference > Sep 2 19:01:20 sql2 NULL pointer dereference > Sep 2 19:01:20 sql2 at 0000000000000358 > Sep 2 19:01:20 sql2 [ 1564.797727] IP: > Sep 2 19:01:20 sql2 at 0000000000000358 > Sep 2 19:01:20 sql2 [ 1564.797727] IP: > Sep 2 19:01:20 sql2 [] sym_int_sir+0x2bf/0x1590 [sym53c8xx] > Sep 2 19:01:20 sql2 [] sym_int_sir+0x2bf/0x1590 [sym53c8xx] > It's in the scsi driver, so it's probable our scsi emulation is broken. Or maybe (unlikely) a bug in the driver. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.