From: "Eric W. Biederman" <ebiederm@xmission.com>
To: Albert Huang <huangjie.albert@bytedance.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
Masahiro Yamada <masahiroy@kernel.org>,
Michal Marek <michal.lkml@markovi.net>,
Nick Desaulniers <ndesaulniers@google.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Michael Roth <michael.roth@amd.com>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@linux.intel.com>,
Nathan Chancellor <nathan@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Sean Christopherson <seanjc@google.com>,
Joerg Roedel <jroedel@suse.de>,
Mark Rutland <mark.rutland@arm.com>,
Kees Cook <keescook@chromium.org>,
linux-kernel@vger.kernel.org, kexec@lists.infradead.org,
linux-kbuild@vger.kernel.org
Subject: Re: [PATCH 0/4] faster kexec reboot
Date: Mon, 25 Jul 2022 12:04:30 -0500 [thread overview]
Message-ID: <8735epf7j5.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <20220725083904.56552-1-huangjie.albert@bytedance.com> (Albert Huang's message of "Mon, 25 Jul 2022 16:38:52 +0800")
Albert Huang <huangjie.albert@bytedance.com> writes:
> From: "huangjie.albert" <huangjie.albert@bytedance.com>
>
> In many time-sensitive scenarios, we need a shorter time to restart
> the kernel. However, in the current kexec fast restart code, there
> are many places in the memory copy operation, verification operation
> and decompression operation, which take more time than 500ms. Through
> the following patch series. machine_kexec-->start_kernel only takes
> 15ms
Is this a tiny embedded device you are taking the timings of?
How are you handling driver shutdown and restart? I would expect those
to be a larger piece of the puzzle than memory.
My desktop can do something like 128GiB/s. Which would suggest that
copying 128MiB of kernel+initrd would take perhaps 10ms. The SHA256
implementation may not be tuned so that could be part of the performance
issue. The SHA256 hash has a reputation for having fast
implementations. I chose SHA256 originally simply because it has more
bits so it makes the odds of detecting an error higher.
If all you care about is booting a kernel as fast as possible it make
make sense to have a large reserved region of memory like we have for
the kexec on panic kernel. If that really makes sense I recommend
adding a second kernel command line option and a reserving second region
of reserved memory. That makes telling if the are any conflicts simple.
I am having a hard time seeing how anyone else would want these options.
Losing megabytes of memory simply because you might reboot using kexec
seems like the wrong side of a trade-off.
The CONFIG_KEXEC_PURGATORY_SKIP_SIG option is very misnamed. It is not
signature verification that is happening it is a hash verification.
There are not encrypted bits at play. Instead there is a check to
ensure that the kernel has not been corrupted by in-flight DMA that some
driver forgot to shut down.
So you are building a version of kexec that if something goes wrong it
could very easily eat your data, or otherwise do some very bad things
that are absolutely non-trivial to debug.
That the decision to skip the sha256 hash that prevents corruption is
happening at compile time, instead of at run-time, will guarantee the
option is simply not available on any general purpose kernel
configuration. Given how dangerous it is to skip the hash verification
it is probably not a bad thing overall, but it is most definitely
something that will make maintenance more difficult.
If done well I don't see why anyone would mind a uncompressed kernel
but I don't see what the advantage of what you are doing is over using
vmlinux is the build directory. It isn't a bzImage but it is the
uncompressed kernel.
As I proof of concept I think what you are doing goes a way to showing
that things can be improved. My overall sense is that improving things
the way you are proposing does not help the general case and simply adds
to the maintenance burden.
Eric
>
> How to measure time:
>
> c code:
> uint64_t current_cycles(void)
> {
> uint32_t low, high;
> asm volatile("rdtsc" : "=a"(low), "=d"(high));
> return ((uint64_t)low) | ((uint64_t)high << 32);
> }
> assembly code:
> pushq %rax
> pushq %rdx
> rdtsc
> mov %eax,%eax
> shl $0x20,%rdx
> or %rax,%rdx
> movq %rdx,0x840(%r14)
> popq %rdx
> popq %rax
> the timestamp may store in boot_params or kexec control page, so we can
> get the all timestamp after kernel boot up.
>
> huangjie.albert (4):
> kexec: reuse crash kernel reserved memory for normal kexec
> kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
> x86: Support the uncompressed kernel to speed up booting
> x86: boot: avoid memory copy if kernel is uncompressed
>
> arch/x86/Kconfig | 10 +++++++++
> arch/x86/boot/compressed/Makefile | 5 ++++-
> arch/x86/boot/compressed/head_64.S | 8 +++++--
> arch/x86/boot/compressed/misc.c | 35 +++++++++++++++++++++++++-----
> arch/x86/purgatory/purgatory.c | 7 ++++++
> include/linux/kexec.h | 9 ++++----
> include/uapi/linux/kexec.h | 2 ++
> kernel/kexec.c | 19 +++++++++++++++-
> kernel/kexec_core.c | 16 ++++++++------
> kernel/kexec_file.c | 20 +++++++++++++++--
> scripts/Makefile.lib | 5 +++++
> 11 files changed, 114 insertions(+), 22 deletions(-)
next prev parent reply other threads:[~2022-07-25 17:04 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-25 8:38 [PATCH 0/4] faster kexec reboot Albert Huang
2022-07-25 8:38 ` [PATCH 1/4] kexec: reuse crash kernel reserved memory for normal kexec Albert Huang
2022-07-25 12:02 ` Jason A. Donenfeld
2022-07-25 12:56 ` Fwd: [External] " 黄杰
2022-07-25 13:30 ` 黄杰
2022-07-25 8:38 ` [PATCH 2/4] kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG Albert Huang
2022-07-25 12:15 ` Jason A. Donenfeld
2022-07-25 13:32 ` [External] " 黄杰
2022-07-28 1:57 ` 黄杰
2022-07-25 12:56 ` Fwd: " 黄杰
2022-07-25 8:38 ` [PATCH 3/4] x86: Support the uncompressed kernel to speed up booting Albert Huang
2022-07-25 12:55 ` Fwd: " 黄杰
2022-07-25 16:57 ` Eric W. Biederman
2022-07-25 8:38 ` [PATCH 4/4] x86: boot: avoid memory copy if kernel is uncompressed Albert Huang
2022-07-25 12:55 ` Fwd: " 黄杰
2022-07-25 12:54 ` Fwd: [PATCH 0/4] faster kexec reboot 黄杰
2022-07-25 17:04 ` Eric W. Biederman [this message]
2022-07-26 5:53 ` [External] " 黄杰
2022-07-28 1:55 ` 黄杰
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8735epf7j5.fsf@email.froward.int.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=huangjie.albert@bytedance.com \
--cc=jroedel@suse.de \
--cc=keescook@chromium.org \
--cc=kexec@lists.infradead.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kbuild@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=masahiroy@kernel.org \
--cc=michael.roth@amd.com \
--cc=michal.lkml@markovi.net \
--cc=mingo@redhat.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=peterz@infradead.org \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox