From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757140Ab1LNNnW (ORCPT ); Wed, 14 Dec 2011 08:43:22 -0500 Received: from mx1.redhat.com ([209.132.183.28]:53679 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754106Ab1LNNnV (ORCPT ); Wed, 14 Dec 2011 08:43:21 -0500 Message-ID: <4EE8A7ED.7060703@redhat.com> Date: Wed, 14 Dec 2011 15:43:09 +0200 From: Avi Kivity User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:8.0) Gecko/20111115 Thunderbird/8.0 MIME-Version: 1.0 To: Marcelo Tosatti CC: Nate Custer , kvm@vger.kernel.org, linux-kernel , Jens Axboe Subject: Re: kvm deadlock References: <54FC5923-2123-4BDD-A506-EA57DCE0C1F6@cpanel.net> <20111214122511.GD18317@amt.cnet> In-Reply-To: <20111214122511.GD18317@amt.cnet> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/14/2011 02:25 PM, Marcelo Tosatti wrote: > On Mon, Dec 05, 2011 at 04:48:16PM -0600, Nate Custer wrote: > > Hello, > > > > I am struggling with repeatable full hardware locks when running 8-12 KVM vms. At some point before the hard lock I get a inconsistent lock state warning. An example of this can be found here: > > > > http://pastebin.com/8wKhgE2C > > > > After that the server continues to run for a while and then starts its death spiral. When it reaches that point it fails to log anything further to the disk, but by attaching a console I have been able to get a stack trace documenting the final implosion: > > > > http://pastebin.com/PbcN76bd > > > > All of the cores end up hung and the server stops responding to all input, including SysRq commands. > > > > I have seen this behavior on two machines (dual E5606 running Fedora 16) both passed cpuburnin testing and memtest86 scans without error. > > > > I have reproduced the crash and stack traces from a Fedora debugging kernel - 3.1.2-1 and with a vanilla 3.1.4 kernel. > > Busted hardware, apparently. Can you reproduce these issues with the > same workload on different hardware? I don't think it's hardware related. The second trace (in the first paste) is called during swap, so GFP_FS is set. The first one is not, so GFP_FS is clear. Lockdep is worried about the following scenario: acpi_early_init() is called calls pcpu_alloc(), which takes pcpu_alloc_mutex eventually, calls kmalloc(), or some other allocation function no memory, so swap call try_to_free_pages() submit_bio() blk_throtl_bio() blkio_alloc_blkg_stats() alloc_percpu() pcpu_alloc(), which takes pcpu_alloc_mutex deadlock It's a little unlikely that acpi_early_init() will OOM, but lockdep doesn't know that. Other callers of pcpu_alloc() could trigger the same thing. When lockdep says [ 5839.924953] other info that might help us debug this: [ 5839.925396] Possible unsafe locking scenario: [ 5839.925397] [ 5839.925840] CPU0 [ 5839.926063] ---- [ 5839.926287] lock(pcpu_alloc_mutex); [ 5839.926533] [ 5839.926756] lock(pcpu_alloc_mutex); [ 5839.926986] It really means GFP_FS simply marks the beginning of a nested, unrelated context that uses the same thread, just like an interrupt. Kudos to lockdep for catching that. I think the allocation in blkio_alloc_blkg_stats() should be moved out of the I/O path into some init function. Copying Jens. -- error compiling committee.c: too many arguments to function