From: Sean Christopherson <seanjc@google.com>
To: mike tancsa <mike@sentex.net>
Cc: Igor Mammedov <imammedo@redhat.com>,
kvm@vger.kernel.org, Leonardo Bras <leobras@redhat.com>
Subject: Re: Guest migration between different Ryzen CPU generations
Date: Fri, 3 Jun 2022 15:09:06 +0000 [thread overview]
Message-ID: <YpokEm84nqVXuOCA@google.com> (raw)
In-Reply-To: <ce81de90-3dd1-1e8a-6a8f-b1c18310cb08@sentex.net>
On Fri, Jun 03, 2022, mike tancsa wrote:
> On 6/2/2022 5:46 PM, Sean Christopherson wrote:
> > On Thu, Jun 02, 2022, mike tancsa wrote:
> > > On 6/2/2022 8:42 AM, Igor Mammedov wrote:
> > > > On Tue, 31 May 2022 13:00:07 -0400
> > > > mike tancsa <mike@sentex.net> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I have been using kvm since the Ubuntu 18 and 20.x LTS series of
> > > > > kernels and distributions without any issues on a whole range of Guests
> > > > > up until now. Recently, we spun up an Ubuntu LTS 22 hypervisor to add to
> > > > > the mix and eventually upgrade to. Hardware is a series of Ryzen 7 CPUs
> > > > > (3700x). Migrations back and forth without issue for Ubuntu 20.x
> > > > > kernels. The first Ubuntu 22 machine was on identical hardware and all
> > > > > was good with that too. The second Ubuntu 22 based machine was spun up
> > > > > with a newer gen Ryzen, a 5800x. On the initial kernel version that
> > > > > came with that release back in April, migrations worked as expected
> > > > > between hardware as well as different kernel versions and qemu / KVM
> > > > > versions that come default with the distribution. Not sure if migrations
> > > > > between kernel and KVM versions "accidentally" worked all these years,
> > > > > but they did. However, we ran into an issue with the kernel
> > > > > 5.15.0-33-generic (possibly with 5.15.0-30 as well) thats part of
> > > > > Ubuntu. Migrations no longer worked to older generation CPUs. I could
> > > > > send a guest TO the box and all was fine, but upon sending the guest to
> > > > > another hypervisor, the sender would see it as successfully migrated,
> > > > > but the VM would typically just hang, with 100% CPU utilization, or
> > > > > sometimes crash. I tried a 5.18 kernel from May 22nd and again the
> > > > > behavior is different. If I specify the CPU as EPYC or EPYC-IBPB, I can
> > > > > migrate back and forth.
> > > > perhaps you are hitting issue fixed by:
> > > > https://lore.kernel.org/lkml/CAJ6HWG66HZ7raAa+YK0UOGLF+4O3JnzbZ+a-0j8GNixOhLk9dA@mail.gmail.com/T/
> > > >
> > > Thanks for the response. I am not sure.
> > I suspect Igor is right. PKRU/PKU, the offending XSAVE feature in that bug, is
> > in the "new in 5800" list below, and that bug fix went into v5.17, i.e. should
> > also be fixed in v5.18.
> >
> > Unfortunately, there's no Fixes: provided and I'm having a hell of a time trying
> > to figure out when the bug was actually introduced. The v5.15 code base is quite
> > different due to a rather massive FPU rework in v5.16. That fix definitely would
> > not apply cleanly, but it doesn't mean that the underlying root cause is different,
> > e.g. the buggy code could easily have been lurking for multiple kernel versions
> > before the rework in v5.16.
> > > That patch is from Feb. Would the bug have been introduced sometime in May to
> > > the 5.15 kernel than Ubuntu 22 would have tracked ?
> > Dates don't necessarily mean a whole lot when it comes to stable kernels, e.g.
> > it's not uncommon for a change to be backported to a stable kernel weeks/months
> > after it initially landed in the upstream tree.
> >
> > Is moving to v5.17 or later an option for you? If not, what was the "original"
> > Ubuntu 22 kernel version that worked? Ideally, assuming it's the same FPU/PKU bug,
> > the fix would be backported to v5.15, but that's likely going to be quite difficult,
> > especially without knowing exactly which commit introduced the bug.
>
> Thanks Sean, I can, but it just means adjusting our work flow a bit. For our
> hypervisors we like to just track LTS and be conservative in what software
> we install and stick with apps and kernels designed specifically to work
> with that release / distribution.
Yeah, tracking LTS is the right thing to do. I'll try to verify and bisect the bug,
and then get the fix backported to v5.15.y, but it may be a week or two before that
happens.
> The Ubuntu 22 kernel that worked back in April was 5.15.0-25-generic. TBH,
> if I am told we were just lucky things worked with different hardware and
> different kernels and KVM versions (ie. migrating bidirectionally from
> ubuntu 20.x to 22.x) I would be fine with that too. But I was a little
> surprised that a kernel version bump from 5.15 would break what was working.
Migrating between kernel/KVM versions is absolutely supposed to work, this is
firmly a kernel bug.
next prev parent reply other threads:[~2022-06-03 15:09 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-31 17:00 Guest migration between different Ryzen CPU generations mike tancsa
2022-06-02 12:42 ` Igor Mammedov
2022-06-02 15:09 ` mike tancsa
2022-06-02 21:46 ` Sean Christopherson
2022-06-03 13:18 ` mike tancsa
2022-06-03 15:09 ` Sean Christopherson [this message]
2022-06-09 14:01 ` Paolo Bonzini
2022-06-09 14:08 ` mike tancsa
2022-06-09 14:31 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YpokEm84nqVXuOCA@google.com \
--to=seanjc@google.com \
--cc=imammedo@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=leobras@redhat.com \
--cc=mike@sentex.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox