From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Glauber Subject: Re: RCU stall with high number of KVM vcpus Date: Tue, 14 Nov 2017 15:19:36 +0100 Message-ID: <20171114141936.GA21650@hc> References: <20171113131000.GA10546@hc> <2832f775-3cbe-d984-fe4f-e018642b6f1d@arm.com> <20171113173552.GA13282@hc> <7dda7be2-f392-8056-d4d3-372bb867729a@arm.com> <20171113184046.GA14678@hc> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvm@vger.kernel.org, Paolo Bonzini , Radim =?utf-8?B?S3LEjW3DocWZ?= , Christoffer Dall , linux-arm-kernel@lists.infradead.org To: Marc Zyngier Return-path: Received: from mail-dm3nam03on0053.outbound.protection.outlook.com ([104.47.41.53]:48192 "EHLO NAM03-DM3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754612AbdKNOTw (ORCPT ); Tue, 14 Nov 2017 09:19:52 -0500 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Tue, Nov 14, 2017 at 01:30:07PM +0000, Marc Zyngier wrote: > On 13/11/17 18:40, Jan Glauber wrote: > > On Mon, Nov 13, 2017 at 06:11:19PM +0000, Marc Zyngier wrote: > >> On 13/11/17 17:35, Jan Glauber wrote: > >>> On Mon, Nov 13, 2017 at 01:47:38PM +0000, Marc Zyngier wrote: > > > > [...] > > > >>>> Please elaborate. Messed in what way? Corrupted? The guest crashing? Or > >>>> is that a tooling issue? > >>> > >>> Every vcpu that oopses prints one line in parallel, so I get blocks like: > >>> [58880.179814] [] ret_from_fork+0x10/0x18 > >>> [58880.179834] [] ret_from_fork+0x10/0x18 > >>> [58880.179847] [] ret_from_fork+0x10/0x18 > >>> [58880.179873] [] ret_from_fork+0x10/0x18 > >>> [58880.179893] [] ret_from_fork+0x10/0x18 > >>> [58880.179911] [] ret_from_fork+0x10/0x18 > >>> [58880.179917] [] ret_from_fork+0x10/0x18 > >>> [58880.180288] [] ret_from_fork+0x10/0x18 > >>> [58880.180303] [] ret_from_fork+0x10/0x18 > >>> [58880.180336] [] ret_from_fork+0x10/0x18 > >>> [58880.180363] [] ret_from_fork+0x10/0x18 > >>> [58880.180384] [] ret_from_fork+0x10/0x18 > >>> [58880.180415] [] ret_from_fork+0x10/0x18 > >>> [58880.180461] [] ret_from_fork+0x10/0x18 > >>> > >>> I can send the full log if you want to have a look. > >> > >> Sure, send that over (maybe not over email though). > > > > Here is the guest dmesg: > > http://paste.ubuntu.com/25955682/ > > Yeah, that's because all the vcpus are getting starved at the same time, > and spitting out interleaved traces... Not very useful anyway, as I > think this is only a consequence of what's happening on the host. > > > > > And the host dmesg as it might have been too big for the lists: > > http://paste.ubuntu.com/25955699/ > > And that one doesn't show much either, apart from indicating that > something is keeping the lock for itself. Drat. > > We need to narrow down the problem, or make it appear on more common HW. > Let me know if you've managed to reproduce it with non-VHE and/or on TX-1. It also shows up when I disable VHE (CONFIG_ARM64_VHE). I'll try enabling some tracepoints next. --Jan