bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] tools/kexec: Introduce utility to build zboot image with bpf section
@ 2025-06-11  2:26 Pingfan Liu
  2025-06-11  2:26 ` [PATCH 1/2] tools/kexec: Introduce a bpf-prog to parse zboot image format Pingfan Liu
  2025-06-11  2:26 ` [PATCH 2/2] tools/kexec: Add a zboot image building tool Pingfan Liu
  0 siblings, 2 replies; 3+ messages in thread
From: Pingfan Liu @ 2025-06-11  2:26 UTC (permalink / raw)
  To: kexec
  Cc: Pingfan Liu, Alexei Starovoitov, Simon Horman, Philipp Rudo,
	Baoquan He, Dave Young, Andrew Morton, bpf

The series '[PATCHv3 0/9] kexec: Use BPF lskel to enable kexec to load PE
format boot image' [1] makes the kernel ready to load PE files with .bpf
sections. These two patches integrate the zboot-bpf image builder into
the kernel source tree, so users can conveniently generate the final image
with the command:
    make -C tools/kexec zboot
Later, the infrastructure for UKI can also be organized here.

To facilitate the review, I have pushed the whole series, including the kernel part,
to GitHub [2]. There is a slight difference from [PATCHv3 8/9] 
"kexec: Integrate bpf light skeleton to load zboot image" in [1]. This
difference is made in the function get_symbol_from_elf() to accommodate
clang's behavior, which combines the section header string table and normal
string table into one.

[1]: https://lore.kernel.org/bpf/20250529041744.16458-1-piliu@redhat.com/
[2]: https://github.com/pfliu/linux/tree/kexec_bpf_v3%2B

Pingfan Liu (2):
  tools/kexec: Introduce a bpf-prog to parse zboot image format
  tools/kexec: Add a zboot image building tool

 tools/kexec/Makefile              |  89 ++++++++++
 tools/kexec/pe.h                  | 177 +++++++++++++++++++
 tools/kexec/zboot_image_builder.c | 279 ++++++++++++++++++++++++++++++
 tools/kexec/zboot_parser_bpf.c    | 157 +++++++++++++++++
 4 files changed, 702 insertions(+)
 create mode 100644 tools/kexec/Makefile
 create mode 100644 tools/kexec/pe.h
 create mode 100644 tools/kexec/zboot_image_builder.c
 create mode 100644 tools/kexec/zboot_parser_bpf.c

-- 
2.49.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] tools/kexec: Introduce a bpf-prog to parse zboot image format
  2025-06-11  2:26 [PATCH 0/2] tools/kexec: Introduce utility to build zboot image with bpf section Pingfan Liu
@ 2025-06-11  2:26 ` Pingfan Liu
  2025-06-11  2:26 ` [PATCH 2/2] tools/kexec: Add a zboot image building tool Pingfan Liu
  1 sibling, 0 replies; 3+ messages in thread
From: Pingfan Liu @ 2025-06-11  2:26 UTC (permalink / raw)
  To: kexec
  Cc: Pingfan Liu, Alexei Starovoitov, Simon Horman, Philipp Rudo,
	Baoquan He, Dave Young, Andrew Morton, bpf

This BPF program aligns with the convention defined in the kernel file
kexec_pe_parser_bpf.lskel.h, where the interface between the BPF program
and the kernel is established, and is composed of:
    four maps:
                    struct bpf_map_desc ringbuf_1;
                    struct bpf_map_desc ringbuf_2;
                    struct bpf_map_desc ringbuf_3;
                    struct bpf_map_desc ringbuf_4;
    four sections:
                    struct bpf_map_desc rodata;
                    struct bpf_map_desc data;
                    struct bpf_map_desc bss;
                    struct bpf_map_desc rodata_str1_1;

    two progs:
            SEC("fentry.s/bpf_handle_pefile")
            SEC("fentry.s/bpf_post_handle_pefile")

This BPF program only uses ringbuf_1, so it minimizes the size of the
other three ringbufs to one byte.  The size of ringbuf_1 is deduced from
the size of the uncompressed file 'vmlinux.bin', which is usually less
than 64MB. With the help of a group of bpf kfuncs: bpf_decompress(),
bpf_copy_to_kernel(), bpf_mem_range_result_put(), this bpf-prog stores
the uncompressed kernel image inside the kernel space.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: bpf@vger.kernel.org
To: kexec@lists.infradead.org
---
 tools/kexec/Makefile           |  81 +++++++++++++++
 tools/kexec/pe.h               | 177 +++++++++++++++++++++++++++++++++
 tools/kexec/zboot_parser_bpf.c | 157 +++++++++++++++++++++++++++++
 3 files changed, 415 insertions(+)
 create mode 100644 tools/kexec/Makefile
 create mode 100644 tools/kexec/pe.h
 create mode 100644 tools/kexec/zboot_parser_bpf.c

diff --git a/tools/kexec/Makefile b/tools/kexec/Makefile
new file mode 100644
index 0000000000000..49de2ab309a43
--- /dev/null
+++ b/tools/kexec/Makefile
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# Ensure Kbuild variables are available
+include ../scripts/Makefile.include
+
+srctree := $(patsubst %/tools/kexec,%,$(CURDIR))
+VMLINUX = $(srctree)/vmlinux
+TOOLSDIR := $(srctree)/tools
+LIBDIR := $(TOOLSDIR)/lib
+BPFDIR := $(LIBDIR)/bpf
+ARCH ?= $(shell uname -m | sed -e s/i.86/x86/ -e s/x86_64/x86/ -e s/aarch64.*/arm64/ -e s/riscv64/riscv/ -e s/loongarch.*/loongarch/)
+# At present, zboot image format is used by arm64, riscv, loongarch
+# And arch/$(ARCH)/boot/vmlinux.bin is the uncompressed file instead of arch/$(ARCH)/boot/Image
+ifeq ($(ARCH),$(filter $(ARCH),arm64 riscv loongarch))
+	EFI_IMAGE := $(srctree)/arch/$(ARCH)/boot/vmlinuz.efi
+	KERNEL_IMAGE := $(srctree)/arch/$(ARCH)/boot/vmlinux.bin
+else
+	@echo "Unsupported architecture: $(ARCH)"
+	@exit 1
+endif
+
+
+CC = clang
+CFLAGS = -O2
+BPF_PROG_CFLAGS = -g -O2 -target bpf -Wall -I $(BPFDIR) -I .
+BPFTOOL = bpftool
+
+# List of generated target files
+HEADERS = vmlinux.h bpf_helper_defs.h image_size.h
+ZBOOT_TARGETS = bytecode.c zboot_parser_bpf.o bytecode.o
+
+
+# Targets
+zboot: $(HEADERS) $(ZBOOT_TARGETS)
+
+# Rule to generate vmlinux.h from vmlinux
+vmlinux.h: $(VMLINUX)
+	@$(BPFTOOL) btf dump file $(VMLINUX) format c > vmlinux.h
+
+bpf_helper_defs.h: $(srctree)/tools/include/uapi/linux/bpf.h
+	@$(QUIET_GEN)$(srctree)/scripts/bpf_doc.py --header \
+		--file $(srctree)/tools/include/uapi/linux/bpf.h > bpf_helper_defs.h
+
+image_size.h: $(KERNEL_IMAGE)
+	@{ \
+		if [ ! -f "$(KERNEL_IMAGE)" ]; then \
+			echo "Error: File '$(KERNEL_IMAGE)' does not exist"; \
+			exit 1; \
+		fi; \
+		FILE_SIZE=$$(stat -c '%s' "$(KERNEL_IMAGE)" 2>/dev/null); \
+		POWER=4096; \
+		while [ $$POWER -le $$FILE_SIZE ]; do \
+			POWER=$$((POWER * 2)); \
+		done; \
+		RINGBUF_SIZE=$$POWER; \
+		echo "#define RINGBUF1_SIZE $$RINGBUF_SIZE" > $@; \
+		echo "#define IMAGE_SIZE $$FILE_SIZE" >> $@; \
+	}
+
+
+# Rule to generate zboot_parser_bpf.o, depends on vmlinux.h
+zboot_parser_bpf.o: zboot_parser_bpf.c vmlinux.h bpf_helper_defs.h
+	@$(CC) $(BPF_PROG_CFLAGS) -c zboot_parser_bpf.c -o zboot_parser_bpf.o
+
+# Generate zboot_parser_bpf.lskel.h using bpftool
+# Then, extract the opts_data[] and opts_insn[] arrays and remove 'static'
+# keywords to avoid being optimized away.
+bytecode.c: zboot_parser_bpf.o
+	@$(BPFTOOL) gen skeleton -L zboot_parser_bpf.o > zboot_parser_bpf.lskel.h
+	@sed -n '/static const char opts_data\[\]/,/;/p' zboot_parser_bpf.lskel.h | sed 's/static const/const/' > $@
+	@sed -n '/static const char opts_insn\[\]/,/;/p' zboot_parser_bpf.lskel.h | sed 's/static const/const/' >> $@
+	@rm -f zboot_parser_bpf.lskel.h
+
+bytecode.o: bytecode.c
+	@$(CC) -c $< -o $@
+
+# Clean up generated files
+clean:
+	@rm -f $(HEADERS) $(ZBOOT_TARGETS)
+
+.PHONY: all clean
diff --git a/tools/kexec/pe.h b/tools/kexec/pe.h
new file mode 100644
index 0000000000000..9f1d086d6cf1a
--- /dev/null
+++ b/tools/kexec/pe.h
@@ -0,0 +1,177 @@
+/*
+ * Extract from linux kernel include/linux/pe.h
+ */
+
+#ifndef __PE_H__
+#define __PE_H__
+
+#define MZ_MAGIC	0x5a4d	/* "MZ" */
+#define PE_MAGIC		0x00004550	/* "PE\0\0" */
+
+struct mz_hdr {
+	uint16_t magic;		/* MZ_MAGIC */
+	uint16_t lbsize;	/* size of last used block */
+	uint16_t blocks;	/* pages in file, 0x3 */
+	uint16_t relocs;	/* relocations */
+	uint16_t hdrsize;	/* header size in "paragraphs" */
+	uint16_t min_extra_pps;	/* .bss */
+	uint16_t max_extra_pps;	/* runtime limit for the arena size */
+	uint16_t ss;		/* relative stack segment */
+	uint16_t sp;		/* initial %sp register */
+	uint16_t checksum;	/* word checksum */
+	uint16_t ip;		/* initial %ip register */
+	uint16_t cs;		/* initial %cs relative to load segment */
+	uint16_t reloc_table_offset;	/* offset of the first relocation */
+	uint16_t overlay_num;	/* overlay number.  set to 0. */
+	uint16_t reserved0[4];	/* reserved */
+	uint16_t oem_id;	/* oem identifier */
+	uint16_t oem_info;	/* oem specific */
+	uint16_t reserved1[10];	/* reserved */
+	uint32_t peaddr;	/* address of pe header */
+	char     message[];	/* message to print */
+};
+
+struct pe_hdr {
+	uint32_t magic;		/* PE magic */
+	uint16_t machine;	/* machine type */
+	uint16_t sections;	/* number of sections */
+	uint32_t timestamp;	/* time_t */
+	uint32_t symbol_table;	/* symbol table offset */
+	uint32_t symbols;	/* number of symbols */
+	uint16_t opt_hdr_size;	/* size of optional header */
+	uint16_t flags;		/* flags */
+};
+
+/* the fact that pe32 isn't padded where pe32+ is 64-bit means union won't
+ * work right.  vomit. */
+struct pe32_opt_hdr {
+	/* "standard" header */
+	uint16_t magic;		/* file type */
+	uint8_t  ld_major;	/* linker major version */
+	uint8_t  ld_minor;	/* linker minor version */
+	uint32_t text_size;	/* size of text section(s) */
+	uint32_t data_size;	/* size of data section(s) */
+	uint32_t bss_size;	/* size of bss section(s) */
+	uint32_t entry_point;	/* file offset of entry point */
+	uint32_t code_base;	/* relative code addr in ram */
+	uint32_t data_base;	/* relative data addr in ram */
+	/* "windows" header */
+	uint32_t image_base;	/* preferred load address */
+	uint32_t section_align;	/* alignment in bytes */
+	uint32_t file_align;	/* file alignment in bytes */
+	uint16_t os_major;	/* major OS version */
+	uint16_t os_minor;	/* minor OS version */
+	uint16_t image_major;	/* major image version */
+	uint16_t image_minor;	/* minor image version */
+	uint16_t subsys_major;	/* major subsystem version */
+	uint16_t subsys_minor;	/* minor subsystem version */
+	uint32_t win32_version;	/* reserved, must be 0 */
+	uint32_t image_size;	/* image size */
+	uint32_t header_size;	/* header size rounded up to
+				   file_align */
+	uint32_t csum;		/* checksum */
+	uint16_t subsys;	/* subsystem */
+	uint16_t dll_flags;	/* more flags! */
+	uint32_t stack_size_req;/* amt of stack requested */
+	uint32_t stack_size;	/* amt of stack required */
+	uint32_t heap_size_req;	/* amt of heap requested */
+	uint32_t heap_size;	/* amt of heap required */
+	uint32_t loader_flags;	/* reserved, must be 0 */
+	uint32_t data_dirs;	/* number of data dir entries */
+};
+
+struct pe32plus_opt_hdr {
+	uint16_t magic;		/* file type */
+	uint8_t  ld_major;	/* linker major version */
+	uint8_t  ld_minor;	/* linker minor version */
+	uint32_t text_size;	/* size of text section(s) */
+	uint32_t data_size;	/* size of data section(s) */
+	uint32_t bss_size;	/* size of bss section(s) */
+	uint32_t entry_point;	/* file offset of entry point */
+	uint32_t code_base;	/* relative code addr in ram */
+	/* "windows" header */
+	uint64_t image_base;	/* preferred load address */
+	uint32_t section_align;	/* alignment in bytes */
+	uint32_t file_align;	/* file alignment in bytes */
+	uint16_t os_major;	/* major OS version */
+	uint16_t os_minor;	/* minor OS version */
+	uint16_t image_major;	/* major image version */
+	uint16_t image_minor;	/* minor image version */
+	uint16_t subsys_major;	/* major subsystem version */
+	uint16_t subsys_minor;	/* minor subsystem version */
+	uint32_t win32_version;	/* reserved, must be 0 */
+	uint32_t image_size;	/* image size */
+	uint32_t header_size;	/* header size rounded up to
+				   file_align */
+	uint32_t csum;		/* checksum */
+	uint16_t subsys;	/* subsystem */
+	uint16_t dll_flags;	/* more flags! */
+	uint64_t stack_size_req;/* amt of stack requested */
+	uint64_t stack_size;	/* amt of stack required */
+	uint64_t heap_size_req;	/* amt of heap requested */
+	uint64_t heap_size;	/* amt of heap required */
+	uint32_t loader_flags;	/* reserved, must be 0 */
+	uint32_t data_dirs;	/* number of data dir entries */
+};
+
+struct data_dirent {
+	uint32_t virtual_address;	/* relative to load address */
+	uint32_t size;
+};
+
+struct data_directory {
+	struct data_dirent exports;		/* .edata */
+	struct data_dirent imports;		/* .idata */
+	struct data_dirent resources;		/* .rsrc */
+	struct data_dirent exceptions;		/* .pdata */
+	struct data_dirent certs;		/* certs */
+	struct data_dirent base_relocations;	/* .reloc */
+	struct data_dirent debug;		/* .debug */
+	struct data_dirent arch;		/* reservered */
+	struct data_dirent global_ptr;		/* global pointer reg. Size=0 */
+	struct data_dirent tls;			/* .tls */
+	struct data_dirent load_config;		/* load configuration structure */
+	struct data_dirent bound_imports;	/* no idea */
+	struct data_dirent import_addrs;	/* import address table */
+	struct data_dirent delay_imports;	/* delay-load import table */
+	struct data_dirent clr_runtime_hdr;	/* .cor (object only) */
+	struct data_dirent reserved;
+};
+
+struct section_header {
+	char name[8];			/* name or "/12\0" string tbl offset */
+	uint32_t virtual_size;		/* size of loaded section in ram */
+	uint32_t virtual_address;	/* relative virtual address */
+	uint32_t raw_data_size;		/* size of the section */
+	uint32_t data_addr;		/* file pointer to first page of sec */
+	uint32_t relocs;		/* file pointer to relocation entries */
+	uint32_t line_numbers;		/* line numbers! */
+	uint16_t num_relocs;		/* number of relocations */
+	uint16_t num_lin_numbers;	/* srsly. */
+	uint32_t flags;
+};
+
+struct win_certificate {
+	uint32_t length;
+	uint16_t revision;
+	uint16_t cert_type;
+};
+
+/*
+ * Return -1 if not PE, else offset of the PE header
+ */
+static int get_pehdr_offset(const char *buf)
+{
+	int pe_hdr_offset;
+
+	pe_hdr_offset = *((int *)(buf + 0x3c));
+	buf += pe_hdr_offset;
+	if (!!memcmp(buf, "PE\0\0", 4)) {
+		printf("Not a PE file\n");
+		return -1;
+	}
+
+	return pe_hdr_offset;
+}
+
+#endif
diff --git a/tools/kexec/zboot_parser_bpf.c b/tools/kexec/zboot_parser_bpf.c
new file mode 100644
index 0000000000000..3f038b34c641a
--- /dev/null
+++ b/tools/kexec/zboot_parser_bpf.c
@@ -0,0 +1,157 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+#include "vmlinux.h"
+#include <bpf_helpers.h>
+#include <bpf_tracing.h>
+#include "image_size.h"
+
+/* 128 MB is big enough to hold either kernel or initramfs */
+#define MAX_RECORD_SIZE	(IMAGE_SIZE + 4096)
+#define MIN_BUF_SIZE 1
+
+#define KEXEC_RES_KERNEL_NAME "kernel"
+#define KEXEC_RES_INITRD_NAME "initrd"
+#define KEXEC_RES_CMDLINE_NAME "cmdline"
+
+/* ringbuf is safe since the user space has no write access to them */
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+	__uint(max_entries, RINGBUF1_SIZE);
+} ringbuf_1 SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+	__uint(max_entries, MIN_BUF_SIZE);
+} ringbuf_2 SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+	__uint(max_entries, MIN_BUF_SIZE);
+} ringbuf_3 SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+	__uint(max_entries, MIN_BUF_SIZE);
+} ringbuf_4 SEC(".maps");
+
+char LICENSE[] SEC("license") = "GPL";
+
+/*
+ * This function ensures that the sections .rodata, .data .bss and .rodata.str1.1
+ * are created for a bpf prog.
+ */
+__attribute__((used)) static int dummy(void)
+{
+	static const char res_kernel[16] __attribute__((used, section(".rodata"))) = KEXEC_RES_KERNEL_NAME;
+	static char local_name[16] __attribute__((used, section(".data"))) = KEXEC_RES_CMDLINE_NAME;
+	static char res_cmdline[16] __attribute__((used, section(".bss")));
+
+	__builtin_memcpy(local_name, KEXEC_RES_INITRD_NAME, 16);
+	return __builtin_memcmp(local_name, res_kernel, 4);
+}
+
+extern int bpf_copy_to_kernel(const char *name, char *buf, int size) __weak __ksym;
+extern struct mem_range_result *bpf_decompress(char *image_gz_payload, int image_gz_sz) __weak __ksym;
+extern int bpf_mem_range_result_put(struct mem_range_result *result) __weak __ksym;
+
+
+
+
+/* see drivers/firmware/efi/libstub/zboot-header.S */
+struct linux_pe_zboot_header {
+	unsigned int mz_magic;
+	char image_type[4];
+	unsigned int payload_offset;
+	unsigned int payload_size;
+	unsigned int reserved[2];
+	char comp_type[4];
+	unsigned int linux_pe_magic;
+	unsigned int pe_header_offset;
+} __attribute__((packed));
+
+
+SEC("fentry.s/bpf_handle_pefile")
+int BPF_PROG(parse_pe, struct kexec_context *context)
+{
+	struct linux_pe_zboot_header *zboot_header;
+	unsigned int image_sz;
+	char *buf;
+	char local_name[32];
+
+	bpf_printk("begin parse PE\n");
+	/* BPF verifier should know each variable initial state */
+	if (!context->image || (context->image_sz > MAX_RECORD_SIZE)) {
+		bpf_printk("Err: image size is greater than 0x%lx\n", MAX_RECORD_SIZE);
+		return 0;
+	}
+
+	/* In order to access bytes not aligned on 2 order, copy into ringbuf.
+	 * And allocate the memory all at once, later overwriting.
+	 *
+	 * R2 is ARG_CONST_ALLOC_SIZE_OR_ZERO, should be decided at compling time
+	 */
+	buf = (char *)bpf_ringbuf_reserve(&ringbuf_1, MAX_RECORD_SIZE, 0);
+	if (!buf) {
+	    	bpf_printk("Err: fail to reserve ringbuf to parse zboot header\n");
+		return 0;
+	}
+	image_sz = context->image_sz;
+	bpf_probe_read((void *)buf, sizeof(struct linux_pe_zboot_header), context->image);
+	zboot_header = (struct linux_pe_zboot_header *)buf;
+	if (!!__builtin_memcmp(&zboot_header->image_type, "zimg",
+			sizeof(zboot_header->image_type))) {
+		bpf_ringbuf_discard(buf, BPF_RB_NO_WAKEUP);
+		bpf_printk("Err: image is not zboot image\n");
+		return 0;
+	}
+
+	unsigned int payload_offset = zboot_header->payload_offset;
+	unsigned int payload_size = zboot_header->payload_size;
+	bpf_printk("zboot image payload offset=0x%x, size=0x%x\n", payload_offset, payload_size);
+	/* sane check */
+	if (payload_size > image_sz) {
+		bpf_ringbuf_discard(buf, BPF_RB_NO_WAKEUP);
+		bpf_printk("Invalid zboot image payload offset and size\n");
+		return 0;
+	}
+	if (payload_size >= MAX_RECORD_SIZE ) {
+		bpf_ringbuf_discard(buf, BPF_RB_NO_WAKEUP);
+		bpf_printk("Err: payload_size > MAX_RECORD_SIZE\n");
+		return 0;
+	}
+	/* Overwrite buf */
+	bpf_probe_read((void *)buf, payload_size, context->image + payload_offset);
+	bpf_printk("Calling bpf_kexec_decompress()\n");
+	struct mem_range_result *r = bpf_decompress(buf, payload_size - 4);
+	if (!r) {
+		bpf_ringbuf_discard(buf, BPF_RB_NO_WAKEUP);
+		bpf_printk("Err: fail to decompress\n");
+		return 0;
+	}
+
+	image_sz = r->data_sz;
+	if (image_sz > MAX_RECORD_SIZE) {
+		bpf_ringbuf_discard(buf, BPF_RB_NO_WAKEUP);
+		bpf_mem_range_result_put(r);
+		bpf_printk("Err: decompressed size too big\n");
+		return 0;
+	}
+	
+	/* Since the decompressed size is bigger than original, no need to clean */
+	bpf_probe_read((void *)buf, image_sz, r->buf);
+	bpf_printk("Calling bpf_copy_to_kernel(), image_sz=0x%x\n", image_sz);
+	/* Verifier is unhappy to expose .rodata.str1.1 'map' to kernel */
+	__builtin_memcpy(local_name, KEXEC_RES_KERNEL_NAME, 32);
+	const char *res_name = local_name;
+	bpf_copy_to_kernel(res_name, buf, image_sz);
+	bpf_ringbuf_discard(buf, BPF_RB_NO_WAKEUP);
+	bpf_mem_range_result_put(r);
+
+	return 0;
+}
+
+SEC("fentry.s/bpf_post_handle_pefile")
+int BPF_PROG(post_parse_pe, struct kexec_context *context)
+{
+	return 0;
+}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] tools/kexec: Add a zboot image building tool
  2025-06-11  2:26 [PATCH 0/2] tools/kexec: Introduce utility to build zboot image with bpf section Pingfan Liu
  2025-06-11  2:26 ` [PATCH 1/2] tools/kexec: Introduce a bpf-prog to parse zboot image format Pingfan Liu
@ 2025-06-11  2:26 ` Pingfan Liu
  1 sibling, 0 replies; 3+ messages in thread
From: Pingfan Liu @ 2025-06-11  2:26 UTC (permalink / raw)
  To: kexec
  Cc: Pingfan Liu, Alexei Starovoitov, Simon Horman, Philipp Rudo,
	Baoquan He, Dave Young, Andrew Morton, bpf

The objcopy binary can append an section into PE file, but it disregards
the DOS header. While the zboot format carries important information:
payload offset and size in the DOS header.

In order to keep track and update such information, here introducing a
dedicated binary tool to build zboot image. The payload offset is
determined by the fact that its offset inside the .data section is
unchanged. Hence the offset of .data section in the new PE file plus the
payload offset within section renders the offset within the new PE file.

The objcopy binary can append a section to a PE file, but it disregards
the DOS header. However, the zboot format carries important information
in the DOS header: payload offset and size.

To track this information and append a new PE section, here a dedicated
binary tool is introduced to build zboot images. The payload's relative
offset within the .data section remains unchanged.  Therefore, the .data
section offset in the new PE file, plus the payload offset within that
section, yields the payload offset within the new PE file.

Finally, the new PE file 'zboot.efi' can be got by the command:
  make -C tools/kexec zboot

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: bpf@vger.kernel.org
To: kexec@lists.infradead.org
---
 tools/kexec/Makefile              |  10 +-
 tools/kexec/zboot_image_builder.c | 279 ++++++++++++++++++++++++++++++
 2 files changed, 288 insertions(+), 1 deletion(-)
 create mode 100644 tools/kexec/zboot_image_builder.c

diff --git a/tools/kexec/Makefile b/tools/kexec/Makefile
index 49de2ab309a43..335d9c15d6c0f 100644
--- a/tools/kexec/Makefile
+++ b/tools/kexec/Makefile
@@ -27,7 +27,7 @@ BPFTOOL = bpftool
 
 # List of generated target files
 HEADERS = vmlinux.h bpf_helper_defs.h image_size.h
-ZBOOT_TARGETS = bytecode.c zboot_parser_bpf.o bytecode.o
+ZBOOT_TARGETS = bytecode.c zboot_parser_bpf.o bytecode.o zboot_image_builder zboot.efi
 
 
 # Targets
@@ -74,6 +74,14 @@ bytecode.c: zboot_parser_bpf.o
 bytecode.o: bytecode.c
 	@$(CC) -c $< -o $@
 
+# Rule to build zboot_image_builder executable
+zboot_image_builder: zboot_image_builder.c
+	@$(CC) $(CFLAGS) $< -o $@
+
+zboot.efi: zboot_image_builder bytecode.o
+	@chmod +x zboot_image_builder
+	@./zboot_image_builder $(EFI_IMAGE) bytecode.o $@
+
 # Clean up generated files
 clean:
 	@rm -f $(HEADERS) $(ZBOOT_TARGETS)
diff --git a/tools/kexec/zboot_image_builder.c b/tools/kexec/zboot_image_builder.c
new file mode 100644
index 0000000000000..94632395a7ddc
--- /dev/null
+++ b/tools/kexec/zboot_image_builder.c
@@ -0,0 +1,279 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * The zboot format carries the compressed kernel image offset and size
+ * information in the DOS header. The program appends a bpf section to PE file,
+ * meanwhile maintains the offset and size information, which is lost when using
+ * objcopy to handle zboot image.
+ */
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include "pe.h"
+
+#ifdef DEBUG_DETAIL
+	#define dprintf(...) printf(__VA_ARGS__)
+#else
+	#define dprintf(...) ((void)0)
+#endif
+
+typedef struct {
+	union {
+		struct {
+			unsigned int mz_magic;
+			char image_type[4];
+			/* offset to the whole file start */
+			unsigned int payload_offset;
+			unsigned int payload_size;
+			unsigned int reserved[2];
+			char comp_type[4];
+		};
+		char raw_bytes[56];
+	};
+	unsigned int linux_pe_magic;
+	/* offset at: 0x3c or 60 */
+	unsigned int pe_header_offset;
+} __attribute__((packed)) pe_zboot_header;
+
+typedef unsigned long	uintptr_t;
+#define ALIGN_UP(p, size) (__typeof__(p))(((uintptr_t)(p) + ((size) - 1)) & ~((size) - 1))
+
+int main(int argc, char **argv)
+{
+	uint32_t payload_new_offset, payload_sect_off;
+	uint32_t payload_size;
+	uint32_t payload_sect_idx;
+	pe_zboot_header *zheader;
+	struct pe_hdr *pe_hdr;
+	struct pe32plus_opt_hdr *opt_hdr;
+	int base_fd, bpf_fd, out_fd;
+	char *base_start_addr, *base_cur;
+	char *out_start_addr, *out_cur;
+	uint32_t out_sz, max_va_end = 0;
+	struct stat sb;
+	int i = 0, ret = 0;
+
+	if (argc != 4) {
+	    fprintf(stderr, "Usage: %s <original_pe> <binary_file> <new_pe>\n", argv[0]);
+	    return -1;
+	}
+
+	const char *original_pe = argv[1];
+	const char *binary_file = argv[2];
+	const char *new_pe = argv[3];
+	FILE *bin_fp = fopen(binary_file, "rb");
+	if (!bin_fp) {
+	    perror("Failed to open binary file");
+	    return -1;
+	}
+	fseek(bin_fp, 0, SEEK_END);
+	size_t bin_size = ftell(bin_fp);
+	fseek(bin_fp, 0, SEEK_SET);
+	base_fd = open(original_pe, O_RDWR);
+	out_fd = open(new_pe, O_RDWR | O_CREAT, 0644);
+	if (base_fd == -1 || out_fd == -1) {
+	    perror("Error opening file");
+	    exit(1);
+	}
+
+	if (fstat(base_fd, &sb) == -1) {
+	    perror("Error getting file size");
+	    exit(1);
+	}
+	base_start_addr = mmap(NULL, sb.st_size, PROT_READ, MAP_SHARED, base_fd, 0);
+	if (base_start_addr == MAP_FAILED) {
+	    perror("Error mmapping the file");
+	    exit(1);
+	}
+	/* 64KB for section table extending */
+	out_sz = sb.st_size + bin_size + (1 << 16);
+	out_start_addr = mmap(NULL, out_sz, PROT_WRITE, MAP_SHARED, out_fd, 0);
+	if (ftruncate(out_fd, out_sz) == -1) {
+		perror("Failed to resize output file");
+		ret = -1;
+		goto err;
+	}
+	if (out_start_addr == MAP_FAILED) {
+	    perror("Error mmapping the file");
+	    exit(1);
+	}
+
+	zheader = (pe_zboot_header *)base_start_addr;
+	if (zheader->mz_magic != 0x5A4D) {  // 'MZ'
+	    fprintf(stderr, "Invalid DOS signature\n");
+	    return -1;
+	}
+	uint32_t pe_hdr_offset = get_pehdr_offset((const char *)base_start_addr);
+	base_cur = base_start_addr + pe_hdr_offset;
+	pe_hdr = (struct pe_hdr *)base_cur;
+	if (pe_hdr->magic!= 0x00004550) {  // 'PE\0\0'
+	    fprintf(stderr, "Invalid PE signature\n");
+	    return -1;
+	}
+	base_cur += sizeof(struct pe_hdr);
+	opt_hdr = (struct pe32plus_opt_hdr *)base_cur;
+	uint32_t file_align = opt_hdr->file_align;
+	uint32_t section_alignment = opt_hdr->section_align;
+
+	uint16_t num_sections = pe_hdr->sections;
+	struct section_header *base_sections, *sect;
+	uint32_t section_table_offset = pe_hdr_offset + sizeof(struct pe_hdr) + pe_hdr->opt_hdr_size;
+	base_sections = (struct section_header *)(base_start_addr + section_table_offset);
+
+	/* Decide the section idx and the payload offset within the section */
+	for (i = 0; i < num_sections; i++) {
+	    sect = &base_sections[i];
+	    if (zheader->payload_offset >= sect->data_addr &&
+		zheader->payload_offset < (sect->data_addr + sect->raw_data_size)) {
+		    payload_sect_idx = i;
+		    payload_sect_off = zheader->payload_offset - sect->data_addr;
+	    }
+	}
+
+	/* Calculate the end of the last section in virtual memory */
+	for (i = 0; i < num_sections; i++) {
+	    uint32_t section_end = base_sections[i].virtual_address + base_sections[i].virtual_size;
+	    if (section_end > max_va_end) {
+	        max_va_end = section_end;
+	    }
+	}
+
+	/* Calculate virtual address for the new .bpf section */
+	uint32_t bpf_virtual_address = ALIGN_UP(max_va_end, section_alignment);
+
+	pe_zboot_header *new_zhdr = malloc(sizeof(pe_zboot_header));
+	memcpy(new_zhdr, zheader, sizeof(pe_zboot_header));
+	struct pe_hdr *new_hdr = malloc(sizeof(struct pe_hdr));
+	memcpy(new_hdr, pe_hdr, sizeof(struct pe_hdr));
+	new_hdr->sections += 1;
+	struct pe32plus_opt_hdr *new_opt_hdr = malloc(pe_hdr->opt_hdr_size);
+	memcpy(new_opt_hdr, opt_hdr, pe_hdr->opt_hdr_size);
+	/* Create new section headers array (original + new section) */
+	struct section_header *new_sections = calloc(1, new_hdr->sections * sizeof(struct section_header));
+	if (!new_sections) {
+	    perror("Failed to allocate memory for new section headers");
+	    return -1;
+	}
+	memcpy(new_sections, base_sections, pe_hdr->sections * sizeof(struct section_header));
+
+	/* Configure the new .bpf section */
+	struct section_header *bpf_section = &new_sections[new_hdr->sections - 1];
+	memset(bpf_section, 0, sizeof(struct section_header));
+	strncpy((char *)bpf_section->name, ".bpf", 8);
+	bpf_section->virtual_size = bin_size;
+	bpf_section->virtual_address = bpf_virtual_address;
+	bpf_section->raw_data_size = bin_size;
+	bpf_section->flags = 0x40000000; //Readable
+
+	/* Update headers */
+	uint32_t new_size_of_image = bpf_section->virtual_address + bpf_section->virtual_size;
+	new_size_of_image = ALIGN_UP(new_size_of_image, section_alignment);
+	new_opt_hdr->image_size = new_size_of_image;
+
+	size_t section_table_size = new_hdr->sections * (sizeof(struct section_header));
+	size_t headers_size = section_table_offset + section_table_size;
+	size_t aligned_headers_size = ALIGN_UP(headers_size, file_align);
+	new_opt_hdr->header_size = aligned_headers_size;
+
+
+	uint32_t current_offset = aligned_headers_size;
+	/*
+	 * If the original PE data_addr is covered by enlarged header_size
+	 * re-assign new data_addr for all sections
+	 */
+	if (base_sections[0].data_addr < aligned_headers_size) {
+		for (i = 0; i < new_hdr->sections; i++) {
+		    new_sections[i].data_addr = current_offset;
+		    current_offset += ALIGN_UP(new_sections[i].raw_data_size, file_align);
+		}
+	/* Keep unchanged, just allocating file pointer for bpf section */
+	} else {
+		uint32_t t;
+		i = new_hdr->sections - 2;
+		t = new_sections[i].data_addr + new_sections[i].raw_data_size;
+		i++;
+		new_sections[i].data_addr = ALIGN_UP(t, file_align);
+	}
+
+	payload_new_offset = new_sections[payload_sect_idx].data_addr + payload_sect_off;
+	/* Update */
+	new_zhdr->payload_offset = payload_new_offset;
+	new_zhdr->payload_size = zheader->payload_size;
+	dprintf("zboot payload_offset updated from 0x%x to 0x%x, size:0x%x\n",
+		zheader->payload_offset, payload_new_offset, new_zhdr->payload_size);
+
+
+	/* compose the new PE file */
+
+	/* Write Dos header */
+	memcpy(out_start_addr, new_zhdr, sizeof(pe_zboot_header));
+	out_cur = out_start_addr + pe_hdr_offset;
+
+	/* Write PE header */
+	memcpy(out_cur, new_hdr, sizeof(struct pe_hdr));
+	out_cur += sizeof(struct pe_hdr);
+
+	/* Write PE optional header */
+	memcpy(out_cur, new_opt_hdr, new_hdr->opt_hdr_size);
+	out_cur += new_hdr->opt_hdr_size;
+
+	/* Write all section headers */
+	memcpy(out_cur, new_sections, new_hdr->sections * sizeof(struct section_header));
+
+	/* Skip padding and copy the section data */
+	for (i = 0; i < pe_hdr->sections; i++) {
+		base_cur = base_start_addr + base_sections[i].data_addr;
+		out_cur = out_start_addr + new_sections[i].data_addr;
+		memcpy(out_cur, base_cur, base_sections[i].raw_data_size);
+	}
+	msync(out_start_addr, new_sections[i].data_addr + new_sections[i].raw_data_size, MS_ASYNC);
+	/* For the bpf section */
+	out_cur = out_start_addr + new_sections[i].data_addr;
+
+	/* Write .bpf section data */
+	char *bin_data = calloc(1, bin_size);
+	if (!bin_data) {
+		perror("Failed to allocate memory for binary data");
+		free(base_sections);
+		free(new_sections);
+		ret = -1;
+		goto err;
+	}
+	if (fread(bin_data, bin_size, 1, bin_fp) != 1) {
+		perror("Failed to read binary data");
+		free(bin_data);
+		free(base_sections);
+		free(new_sections);
+		ret = -1;
+		goto err;
+	}
+
+	if (out_cur + bin_size > out_start_addr + out_sz) {
+	    perror("out of out_fd mmap\n");
+	    ret = -1;
+	    goto err;
+	}
+	memcpy(out_cur, bin_data, bin_size);
+	/* calculate the real size */
+	out_sz = out_cur + bin_size - out_start_addr;
+	msync(out_start_addr, out_sz, MS_ASYNC);
+	/* truncate to the real size */
+	if (ftruncate(out_fd, out_sz) == -1) {
+		perror("Failed to resize output file");
+		ret = -1;
+		goto err;
+	}
+	printf("Create a new PE file with bpf section: %s\n", new_pe);
+err:
+	munmap(out_start_addr, out_sz);
+	munmap(base_start_addr, sb.st_size);
+	close(base_fd);
+	close(out_fd);
+	close(bpf_fd);
+
+	return ret;
+}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-06-11  2:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-11  2:26 [PATCH 0/2] tools/kexec: Introduce utility to build zboot image with bpf section Pingfan Liu
2025-06-11  2:26 ` [PATCH 1/2] tools/kexec: Introduce a bpf-prog to parse zboot image format Pingfan Liu
2025-06-11  2:26 ` [PATCH 2/2] tools/kexec: Add a zboot image building tool Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).