From: Pingfan Liu <piliu@redhat.com>
To: Philipp Rudo <prudo@redhat.com>
Cc: kexec@lists.infradead.org, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
Jeremy Linton <jeremy.linton@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
Simon Horman <horms@kernel.org>,
Gerd Hoffmann <kraxel@redhat.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Viktor Malik <vmalik@redhat.com>,
Jan Hendrik Farr <kernel@jfarr.cc>, Baoquan He <bhe@redhat.com>,
Dave Young <dyoung@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
bpf@vger.kernel.org, systemd-devel@lists.freedesktop.org
Subject: Re: [PATCHv5 05/12] kexec: Introduce kexec_pe_image to parse and load PE file
Date: Wed, 17 Sep 2025 21:04:28 +0800 [thread overview]
Message-ID: <aMqx3PLkYLp3FLOD@fedora> (raw)
In-Reply-To: <20250901163020.30ce3c1e@rotkaeppchen>
On Mon, Sep 01, 2025 at 04:30:20PM +0200, Philipp Rudo wrote:
> Hi Pingfan,
>
> a few nits in addition to what is mentioned in the cover letter.
>
Besides the following comment, as we agree on your suggestion, many of
the logic in this file will be moved to kimage_file_prepare_segments().
> On Tue, 19 Aug 2025 09:24:21 +0800
> Pingfan Liu <piliu@redhat.com> wrote:
>
> > As UEFI becomes popular, a few architectures support to boot a PE format
> > kernel image directly. But the internal of PE format varies, which means
> > each parser for each format.
> >
> > This patch (with the rest in this series) introduces a common skeleton
> > to all parsers, and leave the format parsing in
> > bpf-prog, so the kernel code can keep relative stable.
> >
> > A new kexec_file_ops is implementation, named pe_image_ops.
> >
> > There are some place holder function in this patch. (They will take
> > effect after the introduction of kexec bpf light skeleton and bpf
> > helpers). Overall the parsing progress is a pipeline, the current
> > bpf-prog parser is attached to bpf_handle_pefile(), and detatched at the
> > end of the current stage 'disarm_bpf_prog()' the current parsed result
> > by the current bpf-prog will be buffered in kernel 'prepare_nested_pe()'
> > , and deliver to the next stage. For each stage, the bpf bytecode is
> > extracted from the '.bpf' section in the PE file.
> >
> > Signed-off-by: Pingfan Liu <piliu@redhat.com>
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: Dave Young <dyoung@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Philipp Rudo <prudo@redhat.com>
> > To: kexec@lists.infradead.org
> > ---
> > include/linux/kexec.h | 1 +
> > kernel/Kconfig.kexec | 9 ++
> > kernel/Makefile | 1 +
> > kernel/kexec_pe_image.c | 348 ++++++++++++++++++++++++++++++++++++++++
> > 4 files changed, 359 insertions(+)
> > create mode 100644 kernel/kexec_pe_image.c
>
> [...]
>
> > new file mode 100644
> > index 0000000000000..b0cf9942e68d2
> > --- /dev/null
> > +++ b/kernel/kexec_pe_image.c
> > @@ -0,0 +1,348 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Kexec PE image loader
> > +
> > + * Copyright (C) 2025 Red Hat, Inc
> > + */
> > +
> > +#define pr_fmt(fmt) "kexec_file(Image): " fmt
> > +
> > +#include <linux/err.h>
> > +#include <linux/errno.h>
> > +#include <linux/list.h>
> > +#include <linux/kernel.h>
> > +#include <linux/vmalloc.h>
> > +#include <linux/kexec.h>
> > +#include <linux/pe.h>
> > +#include <linux/string.h>
> > +#include <linux/bpf.h>
> > +#include <linux/filter.h>
> > +#include <asm/byteorder.h>
> > +#include <asm/image.h>
> > +#include <asm/memory.h>
> > +
> > +
> > +#define KEXEC_RES_KERNEL_NAME "kexec:kernel"
> > +#define KEXEC_RES_INITRD_NAME "kexec:initrd"
> > +#define KEXEC_RES_CMDLINE_NAME "kexec:cmdline"
> > +
> > +struct kexec_res {
> > + char *name;
> > + /* The free of buffer is deferred to kimage_file_post_load_cleanup */
> > + struct mem_range_result *r;
> > +};
> > +
> > +static struct kexec_res parsed_resource[3] = {
> > + { KEXEC_RES_KERNEL_NAME, },
> > + { KEXEC_RES_INITRD_NAME, },
> > + { KEXEC_RES_CMDLINE_NAME, },
> > +};
> > +
> > +static bool pe_has_bpf_section(const char *file_buf, unsigned long pe_sz);
> > +
> > +static bool is_valid_pe(const char *kernel_buf, unsigned long kernel_len)
> > +{
> > + struct mz_hdr *mz;
> > + struct pe_hdr *pe;
> > +
> > + if (!kernel_buf)
> > + return false;
> > + mz = (struct mz_hdr *)kernel_buf;
> > + if (mz->magic != IMAGE_DOS_SIGNATURE)
> > + return false;
> > + pe = (struct pe_hdr *)(kernel_buf + mz->peaddr);
> > + if (pe->magic != IMAGE_NT_SIGNATURE)
> > + return false;
> > + if (pe->opt_hdr_size == 0) {
> > + pr_err("optional header is missing\n");
> > + return false;
> > + }
> > +
> > + return pe_has_bpf_section(kernel_buf, kernel_len);
> > +}
>
> Also check for
> pe32plus_opt_hdr->subsys == IMAGE_SUBSYSTEM_EFI_APPLICATION?
>
Yes, it would be better.
> > +
> > +static bool is_valid_format(const char *kernel_buf, unsigned long kernel_len)
> > +{
> > + return is_valid_pe(kernel_buf, kernel_len);
> > +}
> > +
> > +/*
> > + * The UEFI Terse Executable (TE) image has MZ header.
> > + */
> > +static int pe_image_probe(const char *kernel_buf, unsigned long kernel_len)
> > +{
> > + return is_valid_pe(kernel_buf, kernel_len) ? 0 : -1;
> > +}
> > +
> > +static int pe_get_section(const char *file_buf, const char *sect_name,
> > + char **sect_start, unsigned long *sect_sz)
> > +{
> > + struct pe_hdr *pe_hdr;
> > + struct pe32plus_opt_hdr *opt_hdr;
> > + struct section_header *sect_hdr;
> > + int section_nr, i;
> > + struct mz_hdr *mz = (struct mz_hdr *)file_buf;
> > +
> > + *sect_start = NULL;
> > + *sect_sz = 0;
> > + pe_hdr = (struct pe_hdr *)(file_buf + mz->peaddr);
> > + section_nr = pe_hdr->sections;
> > + opt_hdr = (struct pe32plus_opt_hdr *)(file_buf + mz->peaddr + sizeof(struct pe_hdr));
> > + sect_hdr = (struct section_header *)((char *)opt_hdr + pe_hdr->opt_hdr_size);
> > +
> > + for (i = 0; i < section_nr; i++) {
> > + if (strcmp(sect_hdr->name, sect_name) == 0) {
> > + *sect_start = (char *)file_buf + sect_hdr->data_addr;
> > + *sect_sz = sect_hdr->raw_data_size;
> > + return 0;
> > + }
> > + sect_hdr++;
> > + }
> > +
> > + return -1;
> > +}
> > +
> > +static bool pe_has_bpf_section(const char *file_buf, unsigned long pe_sz)
> > +{
> > + char *sect_start = NULL;
> > + unsigned long sect_sz = 0;
> > + int ret;
> > +
> > + ret = pe_get_section(file_buf, ".bpf", §_start, §_sz);
> > + if (ret < 0)
> > + return false;
> > + return true;
> > +}
> > +
> > +/* Load a ELF */
> > +static int arm_bpf_prog(char *bpf_elf, unsigned long sz)
> > +{
> > + return 0;
> > +}
> > +
> > +static void disarm_bpf_prog(void)
> > +{
> > +}
> > +
> > +struct kexec_context {
> > + bool kdump;
> > + char *image;
> > + int image_sz;
> > + char *initrd;
> > + int initrd_sz;
> > + char *cmdline;
> > + int cmdline_sz;
> > +};
> > +
> > +void bpf_handle_pefile(struct kexec_context *context);
> > +void bpf_post_handle_pefile(struct kexec_context *context);
> > +
> > +
> > +/*
> > + * optimize("O0") prevents inline, compiler constant propagation
> > + */
> > +__attribute__((used, optimize("O0"))) void bpf_handle_pefile(struct kexec_context *context)
> > +{
> > + /*
> > + * To prevent linker from Identical Code Folding (ICF) with bpf_handle_pefile,
> > + * making them have different code.
> > + */
> > + volatile int dummy = 0;
> > +
> > + dummy += 1;
> > +}
> > +
> > +__attribute__((used, optimize("O0"))) void bpf_post_handle_pefile(struct kexec_context *context)
> > +{
> > + volatile int dummy = 0;
> > +
> > + dummy += 2;
> > +}
> > +
> > +/*
> > + * PE file may be nested and should be unfold one by one.
> > + * Query 'kernel', 'initrd', 'cmdline' in cur_phase, as they are inputs for the
> > + * next phase.
> > + */
> > +static int prepare_nested_pe(char **kernel, unsigned long *kernel_len, char **initrd,
> > + unsigned long *initrd_len, char **cmdline)
> > +{
> > + struct kexec_res *res;
> > + int ret = -1;
> > +
> > + *kernel = NULL;
> > + *kernel_len = 0;
> > +
> > + res = &parsed_resource[0];
> > + if (!!res->r) {
> > + *kernel = res->r->buf;
> > + *kernel_len = res->r->data_sz;
> > + ret = 0;
> > + }
> > +
> > + res = &parsed_resource[1];
> > + if (!!res->r) {
> > + *initrd = res->r->buf;
> > + *initrd_len = res->r->data_sz;
> > + }
> > +
> > + res = &parsed_resource[2];
> > + if (!!res->r) {
> > + *cmdline = res->r->buf;
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +static void *pe_image_load(struct kimage *image,
> > + char *kernel, unsigned long kernel_len,
> > + char *initrd, unsigned long initrd_len,
> > + char *cmdline, unsigned long cmdline_len)
> > +{
> > + char *linux_start, *initrd_start, *cmdline_start, *bpf_start;
> > + unsigned long linux_sz, initrd_sz, cmdline_sz, bpf_sz;
>
> I don't see a point in defining all the
> {linux,initrd,cmdline}_{start,sz} variables. Either you could reuse
> the corresponding {kernel,initrd,cmdline} variables from the function
> definition. Or better use a kexec_context that contains the same
> information.
>
OK, I will remove them.
> > + struct kexec_res *res;
> > + struct mem_range_result *r;
> > + void *ldata;
> > + int ret;
> > +
> > + linux_start = kernel;
> > + linux_sz = kernel_len;
> > + initrd_start = initrd;
> > + initrd_sz = initrd_len;
> > + cmdline_start = cmdline;
> > + cmdline_sz = cmdline_len;
> > +
> > + while (is_valid_format(linux_start, linux_sz) &&
> > + pe_has_bpf_section(linux_start, linux_sz)) {
> > + struct kexec_context context;
> > +
> > + pe_get_section((const char *)linux_start, ".bpf", &bpf_start, &bpf_sz);
> > + if (!!bpf_sz) {
> > + /* load and attach bpf-prog */
> > + ret = arm_bpf_prog(bpf_start, bpf_sz);
> > + if (ret) {
> > + pr_err("Fail to load .bpf section\n");
> > + ldata = ERR_PTR(ret);
> > + goto err;
> > + }
> > + }
> > + if (image->type != KEXEC_TYPE_CRASH)
> > + context.kdump = false;
> > + else
> > + context.kdump = true;
>
> The bpf_prog cannot change whether kexec is used to load a kdump or
> normal kernel. So this check can be moved outside the loop.
>
Original, this assignment happens only if the file has valid bpf-prog,
which is the loop body. But I have no strong opinion about it. I can
declare context outside the loop as __maybe_unused.
> > + context.image = linux_start;
> > + context.image_sz = linux_sz;
> > + context.initrd = initrd_start;
> > + context.initrd_sz = initrd_sz;
> > + context.cmdline = cmdline_start;
> > + context.cmdline_sz = strlen(cmdline_start);
> > + /* bpf-prog fentry, which handle above buffers. */
> > + bpf_handle_pefile(&context);
> > +
> > + prepare_nested_pe(&linux_start, &linux_sz, &initrd_start,
> > + &initrd_sz, &cmdline_start);
> > + /* bpf-prog fentry */
> > + bpf_post_handle_pefile(&context);
> > + /*
> > + * detach the current bpf-prog from their attachment points.
> > + */
> > + disarm_bpf_prog();
> > + }
> > +
> > + /*
> > + * image's kernel_buf, initrd_buf, cmdline_buf are set. Now they should
> > + * be updated to the new content.
> > + */
> > +
> > + res = &parsed_resource[0];
> > + /* Kernel part should always be parsed */
> > + if (!res->r) {
> > + pr_err("Can not parse kernel\n");
> > + ldata = ERR_PTR(-EINVAL);
> > + goto err;
> > + }
> > + kernel = res->r->buf;
> > + kernel_len = res->r->data_sz;
> > + vfree(image->kernel_buf);
> > + image->kernel_buf = kernel;
> > + image->kernel_buf_len = kernel_len;
>
> Can't you assign the output to image->kernel_buf{_len} directly? Same
> for initrd and cmdline.
>
OK.
> > +
> > + res = &parsed_resource[1];
> > + if (!!res->r) {
> > + initrd = res->r->buf;
> > + initrd_len = res->r->data_sz;
> > + vfree(image->initrd_buf);
> > + image->initrd_buf = initrd;
> > + image->initrd_buf_len = initrd_len;
> > + }
> > + res = &parsed_resource[2];
> > + if (!!res->r) {
> > + cmdline = res->r->buf;
> > + cmdline_len = res->r->data_sz;
> > + kfree(image->cmdline_buf);
> > + image->cmdline_buf = cmdline;
> > + image->cmdline_buf_len = cmdline_len;
> > + }
> > +
> > + if (kernel == NULL || initrd == NULL || cmdline == NULL) {
> > + char *c, buf[64];
> > +
> > + c = buf;
> > + if (kernel == NULL) {
> > + strcpy(c, "kernel ");
> > + c += strlen("kernel ");
> > + }
> > + if (initrd == NULL) {
> > + strcpy(c, "initrd ");
> > + c += strlen("initrd ");
> > + }
> > + if (cmdline == NULL) {
> > + strcpy(c, "cmdline ");
> > + c += strlen("cmdline ");
> > + }
> > + c = '\0';
> > + pr_err("Can not extract data for %s", buf);
> > + ldata = ERR_PTR(-EINVAL);
> > + goto err;
> > + }
>
> This check needs to go. Not having a initrd or cmdline is not an error
> plus not having a kernel already throws an error above. In case you
> want to keep the error message for debugging purpose you can add it to
> the 'else' paths above.
>
Ah, it is valid to have either initrd or cmdline as NULL in
kexec_file_load. I will remove this chunk regarding to the check on
kernel done.
> > +
> > + ret = arch_kexec_kernel_image_probe(image, image->kernel_buf,
> > + image->kernel_buf_len);
> > + if (ret) {
> > + pr_err("Fail to find suitable image loader\n");
> > + ldata = ERR_PTR(ret);
> > + goto err;
> > + }
> > + ldata = kexec_image_load_default(image);
> > + if (IS_ERR(ldata)) {
> > + pr_err("architecture code fails to load image\n");
> > + goto err;
> > + }
> > + image->image_loader_data = ldata;
> > +
> > +err:
> > + for (int i = 0; i < 3; i++) {
>
> Can you please get rid of the magic '3', e.g. by using ARRAY_SIZE.
>
Yes.
Thank you for careful review.
Best Regards,
Pingfan
> Thanks
> Philipp
>
> > + r = parsed_resource[i].r;
> > + if (!r)
> > + continue;
> > + parsed_resource[i].r = NULL;
> > + /*
> > + * The release of buffer defers to
> > + * kimage_file_post_load_cleanup()
> > + */
> > + r->buf = NULL;
> > + r->buf_sz = 0;
> > + mem_range_result_put(r);
> > + }
> > +
> > + return ldata;
> > +}
> > +
> > +const struct kexec_file_ops kexec_pe_image_ops = {
> > + .probe = pe_image_probe,
> > + .load = pe_image_load,
> > +#ifdef CONFIG_KEXEC_IMAGE_VERIFY_SIG
> > + .verify_sig = kexec_kernel_verify_pe_sig,
> > +#endif
> > +};
>
next prev parent reply other threads:[~2025-09-17 13:04 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-19 1:24 [PATCHv5 00/12] kexec: Use BPF lskel to enable kexec to load PE format boot image Pingfan Liu
2025-08-19 1:24 ` [PATCHv5 01/12] kexec_file: Make kexec_image_load_default global visible Pingfan Liu
2025-08-19 1:24 ` [PATCHv5 02/12] lib/decompress: Keep decompressor when CONFIG_KEEP_COMPRESSOR Pingfan Liu
2025-08-19 1:24 ` [PATCHv5 03/12] bpf: Introduce bpf_copy_to_kernel() to buffer the content from bpf-prog Pingfan Liu
2025-08-19 1:24 ` [PATCHv5 04/12] bpf: Introduce decompressor kfunc Pingfan Liu
2025-08-19 1:24 ` [PATCHv5 05/12] kexec: Introduce kexec_pe_image to parse and load PE file Pingfan Liu
2025-09-01 14:30 ` Philipp Rudo
2025-09-17 13:04 ` Pingfan Liu [this message]
2025-08-19 1:24 ` [PATCHv5 06/12] kexec: Integrate with the introduced bpf kfuncs Pingfan Liu
2025-09-01 14:30 ` Philipp Rudo
2025-09-16 6:52 ` Pingfan Liu
2025-09-18 13:43 ` Philipp Rudo
2025-09-19 1:26 ` Pingfan Liu
2025-08-19 1:24 ` [PATCHv5 07/12] kexec: Introduce a bpf-prog lskel to parse PE file Pingfan Liu
2025-08-19 1:24 ` [PATCHv5 08/12] kexec: Factor out routine to find a symbol in ELF Pingfan Liu
2025-09-01 14:31 ` Philipp Rudo
2025-09-16 1:27 ` Pingfan Liu
2025-08-19 1:24 ` [PATCHv5 09/12] kexec: Integrate bpf light skeleton to load zboot image Pingfan Liu
2025-08-19 1:24 ` [PATCHv5 10/12] arm64/kexec: Add PE image format support Pingfan Liu
2025-08-19 18:23 ` kernel test robot
2025-08-19 18:54 ` kernel test robot
2025-08-20 3:09 ` Pingfan Liu
2025-08-19 1:24 ` [PATCHv5 11/12] tools/kexec: Introduce a bpf-prog to parse zboot image format Pingfan Liu
2025-08-19 1:24 ` [PATCHv5 12/12] tools/kexec: Add a zboot image building tool Pingfan Liu
2025-09-01 14:29 ` [PATCHv5 00/12] kexec: Use BPF lskel to enable kexec to load PE format boot image Philipp Rudo
2025-09-16 2:00 ` Pingfan Liu
2025-09-18 13:43 ` Philipp Rudo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aMqx3PLkYLp3FLOD@fedora \
--to=piliu@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=ardb@kernel.org \
--cc=ast@kernel.org \
--cc=bhe@redhat.com \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=daniel@iogearbox.net \
--cc=dyoung@redhat.com \
--cc=eddyz87@gmail.com \
--cc=horms@kernel.org \
--cc=jeremy.linton@arm.com \
--cc=john.fastabend@gmail.com \
--cc=kernel@jfarr.cc \
--cc=kexec@lists.infradead.org \
--cc=kraxel@redhat.com \
--cc=martin.lau@linux.dev \
--cc=prudo@redhat.com \
--cc=song@kernel.org \
--cc=systemd-devel@lists.freedesktop.org \
--cc=vkuznets@redhat.com \
--cc=vmalik@redhat.com \
--cc=will@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox