All of lore.kernel.org
 help / color / mirror / Atom feed
From: jan.glauber@caviumnetworks.com (Jan Glauber)
To: linux-arm-kernel@lists.infradead.org
Subject: RCU stall with high number of KVM vcpus
Date: Tue, 14 Nov 2017 15:19:36 +0100	[thread overview]
Message-ID: <20171114141936.GA21650@hc> (raw)
In-Reply-To: <e8e1af91-b755-e04e-6ab4-c47b570c9fe0@arm.com>

On Tue, Nov 14, 2017 at 01:30:07PM +0000, Marc Zyngier wrote:
> On 13/11/17 18:40, Jan Glauber wrote:
> > On Mon, Nov 13, 2017 at 06:11:19PM +0000, Marc Zyngier wrote:
> >> On 13/11/17 17:35, Jan Glauber wrote:
> >>> On Mon, Nov 13, 2017 at 01:47:38PM +0000, Marc Zyngier wrote:
> > 
> > [...]
> > 
> >>>> Please elaborate. Messed in what way? Corrupted? The guest crashing? Or
> >>>> is that a tooling issue?
> >>>
> >>> Every vcpu that oopses prints one line in parallel, so I get blocks like:
> >>> [58880.179814] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179834] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179847] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179873] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179893] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179911] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179917] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180288] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180303] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180336] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180363] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180384] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180415] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180461] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>>
> >>> I can send the full log if you want to have a look.
> >>
> >> Sure, send that over (maybe not over email though).
> > 
> > Here is the guest dmesg:
> > http://paste.ubuntu.com/25955682/
> 
> Yeah, that's because all the vcpus are getting starved at the same time,
> and spitting out interleaved traces... Not very useful anyway, as I
> think this is only a consequence of what's happening on the host.
> 
> > 
> > And the host dmesg as it might have been too big for the lists:
> > http://paste.ubuntu.com/25955699/
> 
> And that one doesn't show much either, apart from indicating that
> something is keeping the lock for itself. Drat.
> 
> We need to narrow down the problem, or make it appear on more common HW.
> Let me know if you've managed to reproduce it with non-VHE and/or on TX-1.

It also shows up when I disable VHE (CONFIG_ARM64_VHE). I'll try
enabling some tracepoints next.

--Jan

WARNING: multiple messages have this Message-ID (diff)
From: Jan Glauber <jan.glauber@caviumnetworks.com>
To: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvm@vger.kernel.org, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Christoffer Dall" <christoffer.dall@linaro.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: RCU stall with high number of KVM vcpus
Date: Tue, 14 Nov 2017 15:19:36 +0100	[thread overview]
Message-ID: <20171114141936.GA21650@hc> (raw)
In-Reply-To: <e8e1af91-b755-e04e-6ab4-c47b570c9fe0@arm.com>

On Tue, Nov 14, 2017 at 01:30:07PM +0000, Marc Zyngier wrote:
> On 13/11/17 18:40, Jan Glauber wrote:
> > On Mon, Nov 13, 2017 at 06:11:19PM +0000, Marc Zyngier wrote:
> >> On 13/11/17 17:35, Jan Glauber wrote:
> >>> On Mon, Nov 13, 2017 at 01:47:38PM +0000, Marc Zyngier wrote:
> > 
> > [...]
> > 
> >>>> Please elaborate. Messed in what way? Corrupted? The guest crashing? Or
> >>>> is that a tooling issue?
> >>>
> >>> Every vcpu that oopses prints one line in parallel, so I get blocks like:
> >>> [58880.179814] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179834] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179847] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179873] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179893] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179911] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.179917] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180288] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180303] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180336] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180363] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180384] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180415] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>> [58880.180461] [<ffff000008084b98>] ret_from_fork+0x10/0x18
> >>>
> >>> I can send the full log if you want to have a look.
> >>
> >> Sure, send that over (maybe not over email though).
> > 
> > Here is the guest dmesg:
> > http://paste.ubuntu.com/25955682/
> 
> Yeah, that's because all the vcpus are getting starved at the same time,
> and spitting out interleaved traces... Not very useful anyway, as I
> think this is only a consequence of what's happening on the host.
> 
> > 
> > And the host dmesg as it might have been too big for the lists:
> > http://paste.ubuntu.com/25955699/
> 
> And that one doesn't show much either, apart from indicating that
> something is keeping the lock for itself. Drat.
> 
> We need to narrow down the problem, or make it appear on more common HW.
> Let me know if you've managed to reproduce it with non-VHE and/or on TX-1.

It also shows up when I disable VHE (CONFIG_ARM64_VHE). I'll try
enabling some tracepoints next.

--Jan

  reply	other threads:[~2017-11-14 14:19 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20171113131000.GA10546@hc>
2017-11-13 13:47 ` RCU stall with high number of KVM vcpus Marc Zyngier
2017-11-13 13:47   ` Marc Zyngier
2017-11-13 17:35   ` Jan Glauber
2017-11-13 17:35     ` Jan Glauber
2017-11-13 18:11     ` Marc Zyngier
2017-11-13 18:11       ` Marc Zyngier
2017-11-13 18:40       ` Jan Glauber
2017-11-13 18:40         ` Jan Glauber
2017-11-14 13:30         ` Marc Zyngier
2017-11-14 13:30           ` Marc Zyngier
2017-11-14 14:19           ` Jan Glauber [this message]
2017-11-14 14:19             ` Jan Glauber
2017-11-14  7:52       ` Jan Glauber
2017-11-14  7:52         ` Jan Glauber
2017-11-14  8:49         ` Marc Zyngier
2017-11-14  8:49           ` Marc Zyngier
2017-11-14 11:34           ` Suzuki K Poulose
2017-11-14 11:34             ` Suzuki K Poulose
2017-11-13 18:13     ` Shameerali Kolothum Thodi
2017-11-13 18:13       ` Shameerali Kolothum Thodi
2017-11-14  7:49       ` Jan Glauber
2017-11-14  7:49         ` Jan Glauber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171114141936.GA21650@hc \
    --to=jan.glauber@caviumnetworks.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.