public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Gowans, James" <jgowans@amazon.com>
To: "bhe@redhat.com" <bhe@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Cc: "kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	"maz@kernel.org" <maz@kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"sre@kernel.org" <sre@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"seanjc@google.com" <seanjc@google.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"arnd@arndb.de" <arnd@arndb.de>, "wens@csie.org" <wens@csie.org>,
	"ebiederm@xmission.com" <ebiederm@xmission.com>,
	"Schönherr, Jan H." <jschoenh@amazon.de>, "Graf (AWS),
	Alexander" <graf@amazon.de>,
	"orsonzhai@gmail.com" <orsonzhai@gmail.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"samuel@sholland.org" <samuel@sholland.org>,
	"pavel@ucw.cz" <pavel@ucw.cz>,
	"jernej.skrabec@gmail.com" <jernej.skrabec@gmail.com>
Subject: Re: [PATCH] kexec: do syscore_shutdown() in kernel_kexec
Date: Tue, 19 Dec 2023 07:41:52 +0000	[thread overview]
Message-ID: <9ffa2c4d3e808feb2afa6f02f4afabf1cd674516.camel@amazon.com> (raw)
In-Reply-To: <ZYEafpms++a3a8ch@MiWiFi-R3L-srv>

On Tue, 2023-12-19 at 12:22 +0800, Baoquan He wrote:
> Add Andrew to CC as Andrew helps to pick kexec/kdump patches.

Ah, thanks, I didn't realise that Andrew pulls in the kexec patches.
> 
> On 12/13/23 at 08:40am, James Gowans wrote:
> ......
> > This has been tested by doing a kexec on x86_64 and aarch64.
> 
> Hi James,
> 
> Thanks for this great patch. My colleagues have opened bug in rhel to
> track this and try to veryfy this patch. However, they can't reproduce
> the issue this patch is fixing. Could you tell more about where and how
> to reproduce so that we can be aware of it better? Thanks in advance.

Sure! The TL;DR is: run a VMX (Intel x86) KVM VM on Linux v6.4+ and do a
kexec while the  KVM VM is still running. Before this patch the system
will triple fault.

In more detail:
Run a bare metal host on a modern Intel CPU with VMX support. The kernel
I was using was 6.7.0-rc5+.
You can totally do this with a QEMU "host" as well, btw, that's how I
did the debugging and attached GDB to it to figure out what was up.

If you want a virtual "host" launch with:

-cpu host -M q35,kernel-irqchip=split,accel=kvm -enable-kvm

Launch a KVM guest VM, eg:

qemu-system-x86_64 \
  -enable-kvm \
  -cdrom alpine-virt-3.19.0-x86_64.iso \
  -nodefaults -nographic -M q35 \
  -serial mon:stdio

While the guest VM is *still running* do a kexec on the host, eg:

kexec -l --reuse-cmdline --initrd=config-6.7.0-rc5+ vmlinuz-6.7.0-rc5+ && \
  kexec -e

The kexec can be to anything, but I generally just kexec to the same
kernel/ramdisk as is currently running. Ie: same-version kexec.

Before this patch the kexec will get stuck, after this the kexec will go
smoothly and the system will end up in the new kernel in a few seconds.

I hope those steps are clear and you can repro this?

BTW, the reason that it's important for the KVM VM to still be running
when the host does the kexec is because KVM internally maintains a usage
counter and will disable virtualisation once all VMs have been
terminated, via:

__fput(kvm_fd)
  kvm_vm_release
    kvm_destroy_vm
      hardware_disable_all
        hardware_disable_all_nolock
          kvm_usage_count--;
          if (!kvm_usage_count)
            on_each_cpu(hardware_disable_nolock, NULL, 1);

So if all KVM fds are closed then kexec will work because VMXE is
cleared on all CPUs when the last VM is destroyed. If the KVM fds are
still open (ie: QEMU process still exists) then the issue manifests.  It
sounds nasty to do a kexec while QEMU processes are still around but
this is a perfectly normal flow for live update:
1. Pause and Serialise VM state
2. kexec
3. deserialise and resume VMs.
In that flow there's no need to actually kill the QEMU process, as long
as the VM is *paused* and has been serialised we can happily kexec.

JG


  reply	other threads:[~2023-12-19  7:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-13  6:40 [PATCH] kexec: do syscore_shutdown() in kernel_kexec James Gowans
2023-12-13 16:39 ` Eric W. Biederman
2023-12-18 12:41   ` Gowans, James
2024-01-09  6:59     ` Gowans, James
2023-12-19  4:22 ` Baoquan He
2023-12-19  7:41   ` Gowans, James [this message]
2023-12-19  8:26     ` bhe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ffa2c4d3e808feb2afa6f02f4afabf1cd674516.camel@amazon.com \
    --to=jgowans@amazon.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=ebiederm@xmission.com \
    --cc=graf@amazon.de \
    --cc=jernej.skrabec@gmail.com \
    --cc=jschoenh@amazon.de \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=orsonzhai@gmail.com \
    --cc=pavel@ucw.cz \
    --cc=pbonzini@redhat.com \
    --cc=samuel@sholland.org \
    --cc=seanjc@google.com \
    --cc=sre@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=wens@csie.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox