From: Sean Christopherson <seanjc@google.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org
Subject: Re: dozens of qemu/kvm VMs getting into stuck states since kernel ~5.13
Date: Tue, 7 Dec 2021 22:25:51 +0000 [thread overview]
Message-ID: <Ya/fb2Lc6OoHw7CP@google.com> (raw)
In-Reply-To: <CAJCQCtSx_OFkN1csWGQ2-pP1jLgziwr0oXoMMb4q8Y=UYPGqAg@mail.gmail.com>
On Tue, Dec 07, 2021, Chris Murphy wrote:
> cc: qemu-devel
>
> Hi,
>
> I'm trying to help progress a very troublesome and so far elusive bug
> we're seeing in Fedora infrastructure. When running dozens of qemu-kvm
> VMs simultaneously, eventually they become unresponsive, as well as
> new processes as we try to extract information from the host about
> what's gone wrong.
Have you tried bisecting? IIUC, the issues showed up between v5.11 and v5.12.12,
bisecting should be relatively straightforward.
> Systems (Fedora openQA worker hosts) on kernel 5.12.12+ wind up in a
> state where forking does not work correctly, breaking most things
> https://bugzilla.redhat.com/show_bug.cgi?id=2009585
>
> In subsequent testing, we used newer kernels with lockdep and other
> debug stuff enabled, and managed to capture a hung task with a bunch
> of locks listed, including kvm and qemu processes. But I can't parse
> it.
>
> 5.15-rc7
> https://bugzilla-attachments.redhat.com/attachment.cgi?id=1840941
> 5.15+
> https://bugzilla-attachments.redhat.com/attachment.cgi?id=1840939
>
> If anyone can take a glance at those kernel messages, and/or give
> hints how we can extract more information for debugging, it'd be
> appreciated. Maybe all of that is normal and the actual problem isn't
> in any of these traces.
All the instances of
(&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x77/0x720 [kvm]
are uninteresting and expected, that's just each vCPU task taking its associated
vcpu->mutex, likely for KVM_RUN.
At a glance, the XFS stuff looks far more interesting/suspect.
next prev parent reply other threads:[~2021-12-07 22:25 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-07 19:44 dozens of qemu/kvm VMs getting into stuck states since kernel ~5.13 Chris Murphy
2021-12-07 22:25 ` Sean Christopherson [this message]
2021-12-08 17:09 ` Chris Murphy
2021-12-08 17:09 ` Chris Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ya/fb2Lc6OoHw7CP@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=lists@colorremedies.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.