rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] MODVERSIONS + RUST Redux
@ 2023-11-18  2:54 Matthew Maurer
  2023-11-18  2:54 ` [PATCH v2 1/5] export_report: Rehabilitate script Matthew Maurer
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Matthew Maurer @ 2023-11-18  2:54 UTC (permalink / raw)
  To: Masahiro Yamada, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain
  Cc: Nathan Chancellor, Nicolas Schier, linuxppc-dev, linux-kernel,
	linux-modules, linux-kbuild, rust-for-linux, Laura Abbott,
	Matthew Maurer

The goal of this patch series is to allow MODVERSIONS and RUST to be
enabled simultaneously. The primary issue with doing this at the moment
is that Rust uses some extremely long symbol names - for those
unfamiliar with Rust, it may be helpful to think of some of the mangled
C++ names you may have seen in binaries in the past.

Previously, Gary Guo attempted to accomplish this by modifying the
existing modversion format [1] to support variable-length symbol names.
This was unfortunately considered to be a potential userspace break
because kmod tools inspect this kernel module metadata. Masahiro Yamada
suggested [2] that this could instead be done with a section per-field.
This gives us the ability to be more flexible with this format in the
future, as a new field or additional information will be in a new
section which userspace tools will not yet attempt to read.

In the previous version of this patchset, Luis Chamberlain suggested [3]
I move validation out of the version checking and into the elf validity
checker, and also add kernel-docs over there. I found
elf_validity_cached_copy to be fairly dense and difficult to directly
describe, so I refactored it into easier to explain pieces. In the
process, I found a few missing checks and added those as well. See
[PATCH 2/5] for more details. If this is too much, I'm more than happy
to drop this patch from the series in favor of just adding the
kernel-doc to the original code, but figured I'd offer it up in case the
added clarity and checks were valuable.

[1] https://lore.kernel.org/lkml/20230111161155.1349375-1-gary@garyguo.net/
[2] https://lore.kernel.org/lkml/CAK7LNATsuszFR7JB5ZkqVS1W=hWr9=E7bTf+MvgJ+NXT3aZNwg@mail.gmail.com/
[3] https://lore.kernel.org/lkml/ZVZNh%2FPA5HiVRkeb@bombadil.infradead.org/

Matthew Maurer (5):
  export_report: Rehabilitate script
  modules: Refactor + kdoc elf_validity_cached_copy
  modpost: Extended modversion support
  rust: Allow MODVERSIONS
  export_report: Use new version info format

 arch/powerpc/kernel/module_64.c |  25 +-
 init/Kconfig                    |   1 -
 kernel/module/internal.h        |  18 +-
 kernel/module/main.c            | 663 +++++++++++++++++++++++++-------
 kernel/module/version.c         |  43 +++
 scripts/export_report.pl        |  17 +-
 scripts/mod/modpost.c           |  37 +-
 7 files changed, 642 insertions(+), 162 deletions(-)

-- 
2.43.0.rc0.421.g78406f8d94-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/5] export_report: Rehabilitate script
  2023-11-18  2:54 [PATCH v2 0/5] MODVERSIONS + RUST Redux Matthew Maurer
@ 2023-11-18  2:54 ` Matthew Maurer
  2023-11-18 11:35   ` Greg KH
  2023-11-18  2:54 ` [PATCH v2 2/5] modules: Refactor + kdoc elf_validity_cached_copy Matthew Maurer
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Matthew Maurer @ 2023-11-18  2:54 UTC (permalink / raw)
  To: Masahiro Yamada, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain
  Cc: Nathan Chancellor, Nicolas Schier, linuxppc-dev, linux-kernel,
	linux-modules, linux-kbuild, rust-for-linux, Laura Abbott,
	Matthew Maurer

* modules.order has .o files when in a build dir, support this
* .mod.c source layout has changed, update regexes to match
* Add a stage 3, to be more robust against additional .mod.c content

Signed-off-by: Matthew Maurer <mmaurer@google.com>
---
 scripts/export_report.pl | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/scripts/export_report.pl b/scripts/export_report.pl
index feb3d5542a62..dcef915405f3 100755
--- a/scripts/export_report.pl
+++ b/scripts/export_report.pl
@@ -55,6 +55,7 @@ sub collectcfiles {
     open my $fh, '< modules.order' or die "cannot open modules.order: $!\n";
     while (<$fh>) {
 	s/\.ko$/.mod.c/;
+	s/\.o$/.mod.c/;
 	push (@file, $_)
     }
     close($fh);
@@ -120,10 +121,14 @@ foreach my $thismod (@allcfiles) {
 			next;
 		}
 		if ($state == 1) {
-			$state = 2 if ($_ =~ /__attribute__\(\(section\("__versions"\)\)\)/);
+			$state = 2 if ($_ =~ /__used __section\("__versions"\)/);
 			next;
 		}
 		if ($state == 2) {
+			if ( $_ =~ /};/ ) {
+				$state = 3;
+				next;
+			}
 			if ( $_ !~ /0x[0-9a-f]+,/ ) {
 				next;
 			}
@@ -133,7 +138,7 @@ foreach my $thismod (@allcfiles) {
 			push(@{$MODULE{$thismod}} , $sym);
 		}
 	}
-	if ($state != 2) {
+	if ($state != 3) {
 		warn "WARNING:$thismod is not built with CONFIG_MODVERSIONS enabled\n";
 		$modversion_warnings++;
 	}
-- 
2.43.0.rc0.421.g78406f8d94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/5] modules: Refactor + kdoc elf_validity_cached_copy
  2023-11-18  2:54 [PATCH v2 0/5] MODVERSIONS + RUST Redux Matthew Maurer
  2023-11-18  2:54 ` [PATCH v2 1/5] export_report: Rehabilitate script Matthew Maurer
@ 2023-11-18  2:54 ` Matthew Maurer
  2023-11-18 11:36   ` Greg KH
  2023-11-18  2:54 ` [PATCH v2 3/5] modpost: Extended modversion support Matthew Maurer
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Matthew Maurer @ 2023-11-18  2:54 UTC (permalink / raw)
  To: Masahiro Yamada, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain
  Cc: Nathan Chancellor, Nicolas Schier, linuxppc-dev, linux-kernel,
	linux-modules, linux-kbuild, rust-for-linux, Laura Abbott,
	Matthew Maurer

Functionality is almost identical, just structured for better
documentation and readability. Changes:

* Section names are checked for *all* non-SHT_NULL sections, not just
  SHF_ALLOC sections. We have code that accesses section names of
  non-SHF_ALLOC sections (see find_any_sec for example)
* The section name check occurs *before* strcmping on section names.
  Previously, it was possible to use an out-of-bounds offset to strcmp
  against ".modinfo" or ".gnu.linkonce.this_module"
* strtab is checked for NUL lead+termination and nonzero size
* The symbol table is swept to ensure offsets are inbounds of strtab

While some of these oversights would normally be worrying, all of the
potentially unverified accesses occur after signature check, and only in
response to a user who can load a kernel module.

Signed-off-by: Matthew Maurer <mmaurer@google.com>
---
 kernel/module/internal.h |   7 +-
 kernel/module/main.c     | 585 +++++++++++++++++++++++++++++----------
 2 files changed, 444 insertions(+), 148 deletions(-)

diff --git a/kernel/module/internal.h b/kernel/module/internal.h
index c8b7b4dcf782..d8dc52eb9c82 100644
--- a/kernel/module/internal.h
+++ b/kernel/module/internal.h
@@ -80,7 +80,12 @@ struct load_info {
 	unsigned int used_pages;
 #endif
 	struct {
-		unsigned int sym, str, mod, vers, info, pcpu;
+		unsigned int sym;
+		unsigned int str;
+		unsigned int mod;
+		unsigned int vers;
+		unsigned int info;
+		unsigned int pcpu;
 	} index;
 };
 
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 98fedfdb8db5..8b2848b3183a 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -193,6 +193,38 @@ static unsigned int find_sec(const struct load_info *info, const char *name)
 	return 0;
 }
 
+/**
+ * find_any_unique_sec() - Find a unique section index by name
+ * @info: Load info for the module to scan
+ * @name: Name of the section we're looking for
+ *
+ * Locates a unique section by name. Ignores SHF_ALLOC.
+ *
+ * Return: Section index if found uniquely, zero if absent, negative count
+ *         of total instances if multiple were found.
+ */
+static int find_any_unique_sec(const struct load_info *info, const char *name)
+{
+	unsigned int idx;
+	unsigned int count = 0;
+	int i;
+
+	for (i = 1; i < info->hdr->e_shnum; i++) {
+		if (strcmp(info->secstrings + info->sechdrs[i].sh_name,
+			   name) == 0) {
+			count++;
+			idx = i;
+		}
+	}
+	if (count == 1) {
+		return idx;
+	} else if (count == 0) {
+		return 0;
+	} else {
+		return -count;
+	}
+}
+
 /* Find a module section, or NULL. */
 static void *section_addr(const struct load_info *info, const char *name)
 {
@@ -1627,7 +1659,7 @@ bool __weak module_exit_section(const char *name)
 	return strstarts(name, ".exit");
 }
 
-static int validate_section_offset(struct load_info *info, Elf_Shdr *shdr)
+static int validate_section_offset(const struct load_info *info, Elf_Shdr *shdr)
 {
 #if defined(CONFIG_64BIT)
 	unsigned long long secend;
@@ -1646,62 +1678,80 @@ static int validate_section_offset(struct load_info *info, Elf_Shdr *shdr)
 	return 0;
 }
 
-/*
- * Check userspace passed ELF module against our expectations, and cache
- * useful variables for further processing as we go.
- *
- * This does basic validity checks against section offsets and sizes, the
- * section name string table, and the indices used for it (sh_name).
+/**
+ * elf_validity_ehdr() - Checks an ELF header for module validity
+ * @info: Load info containing the ELF header to check
  *
- * As a last step, since we're already checking the ELF sections we cache
- * useful variables which will be used later for our convenience:
+ * Checks whether an ELF header could belong to a valid module. Checks:
  *
- * 	o pointers to section headers
- * 	o cache the modinfo symbol section
- * 	o cache the string symbol section
- * 	o cache the module section
+ * * ELF header is within the data the user provided
+ * * ELF magic is present
+ * * It is relocatable (not final linked, not core file, etc.)
+ * * The header's machine type matches what the architecture expects.
+ * * Optional arch-specific hook for other properties
+ *   - module_elf_check_arch() is currently only used by PPC to check
+ *   ELF ABI version, but may be used by others in the future.
  *
- * As a last step we set info->mod to the temporary copy of the module in
- * info->hdr. The final one will be allocated in move_module(). Any
- * modifications we make to our copy of the module will be carried over
- * to the final minted module.
+ * Return: %0 if valid, %-ENOEXEC on failure.
  */
-static int elf_validity_cache_copy(struct load_info *info, int flags)
+static int elf_validity_ehdr(const struct load_info *info)
 {
-	unsigned int i;
-	Elf_Shdr *shdr, *strhdr;
-	int err;
-	unsigned int num_mod_secs = 0, mod_idx;
-	unsigned int num_info_secs = 0, info_idx;
-	unsigned int num_sym_secs = 0, sym_idx;
-
 	if (info->len < sizeof(*(info->hdr))) {
 		pr_err("Invalid ELF header len %lu\n", info->len);
-		goto no_exec;
+		return -ENOEXEC;
 	}
-
 	if (memcmp(info->hdr->e_ident, ELFMAG, SELFMAG) != 0) {
 		pr_err("Invalid ELF header magic: != %s\n", ELFMAG);
-		goto no_exec;
+		return -ENOEXEC;
 	}
 	if (info->hdr->e_type != ET_REL) {
 		pr_err("Invalid ELF header type: %u != %u\n",
 		       info->hdr->e_type, ET_REL);
-		goto no_exec;
+		return -ENOEXEC;
 	}
 	if (!elf_check_arch(info->hdr)) {
 		pr_err("Invalid architecture in ELF header: %u\n",
 		       info->hdr->e_machine);
-		goto no_exec;
+		return -ENOEXEC;
 	}
 	if (!module_elf_check_arch(info->hdr)) {
 		pr_err("Invalid module architecture in ELF header: %u\n",
 		       info->hdr->e_machine);
-		goto no_exec;
+		return -ENOEXEC;
 	}
+	return 0;
+}
+
+/**
+ * elf_validity_cache_sechdrs() - Cache section headers if valid
+ * @info: Load info to compute section headers from
+ *
+ * Checks:
+ *
+ * * ELF header is valid (see elf_validity_ehdr())
+ * * Section headers are the size we expect
+ * * Section array fits in the user provided data
+ * * Section index 0 is NULL
+ * * Section contents are inbounds
+ *
+ * Then updates @info with a &load_info->sechdrs pointer if valid.
+ *
+ * Return: %0 if valid, negative error code if validation failed.
+ */
+static int elf_validity_cache_sechdrs(struct load_info *info)
+{
+	Elf_Shdr *sechdrs;
+	Elf_Shdr *shdr;
+	int i;
+	int err;
+
+	err = elf_validity_ehdr(info);
+	if (err < 0)
+		return err;
+
 	if (info->hdr->e_shentsize != sizeof(Elf_Shdr)) {
 		pr_err("Invalid ELF section header size\n");
-		goto no_exec;
+		return -ENOEXEC;
 	}
 
 	/*
@@ -1713,10 +1763,66 @@ static int elf_validity_cache_copy(struct load_info *info, int flags)
 	    || (info->hdr->e_shnum * sizeof(Elf_Shdr) >
 		info->len - info->hdr->e_shoff)) {
 		pr_err("Invalid ELF section header overflow\n");
-		goto no_exec;
+		return -ENOEXEC;
 	}
 
-	info->sechdrs = (void *)info->hdr + info->hdr->e_shoff;
+	sechdrs = (void *)info->hdr + info->hdr->e_shoff;
+
+	/*
+	 * The code assumes that section 0 has a length of zero and
+	 * an addr of zero, so check for it.
+	 */
+	if (sechdrs[0].sh_type != SHT_NULL
+	    || sechdrs[0].sh_size != 0
+	    || sechdrs[0].sh_addr != 0) {
+		pr_err("ELF Spec violation: section 0 type(%d)!=SH_NULL or non-zero len or addr\n",
+		       sechdrs[0].sh_type);
+		return -ENOEXEC;
+	}
+
+	/* Validate contents are inbounds */
+	for (i = 1; i < info->hdr->e_shnum; i++) {
+		shdr = &sechdrs[i];
+		switch (shdr->sh_type) {
+		case SHT_NULL:
+		case SHT_NOBITS:
+			/* No contents, offset/size don't mean anything */
+			continue;
+		default:
+			err = validate_section_offset(info, shdr);
+			if (err < 0) {
+				pr_err("Invalid ELF section in module (section %u type %u)\n",
+				       i, shdr->sh_type);
+				return err;
+			}
+		}
+	}
+
+	info->sechdrs = sechdrs;
+
+	return 0;
+}
+
+/**
+ * elf_validity_cache_secstrings() - Caches section names if valid
+ * @info: Load info to cache section names from. Must have valid sechdrs.
+ *
+ * Specifically checks:
+ *
+ * * Section name table index is inbounds of section headers
+ * * Section name table is not empty
+ * * Section name table is NUL terminated
+ * * All section name offsets are inbounds of the section
+ *
+ * Then updates @info with a &load_info->secstrings pointer if valid.
+ *
+ * Return: %0 if valid, negative error code if validation failed.
+ */
+static int elf_validity_cache_secstrings(struct load_info *info)
+{
+	Elf_Shdr *strhdr, *shdr;
+	char *secstrings;
+	int i;
 
 	/*
 	 * Verify if the section name table index is valid.
@@ -1726,165 +1832,234 @@ static int elf_validity_cache_copy(struct load_info *info, int flags)
 		pr_err("Invalid ELF section name index: %d || e_shstrndx (%d) >= e_shnum (%d)\n",
 		       info->hdr->e_shstrndx, info->hdr->e_shstrndx,
 		       info->hdr->e_shnum);
-		goto no_exec;
+		return -ENOEXEC;
 	}
 
 	strhdr = &info->sechdrs[info->hdr->e_shstrndx];
-	err = validate_section_offset(info, strhdr);
-	if (err < 0) {
-		pr_err("Invalid ELF section hdr(type %u)\n", strhdr->sh_type);
-		return err;
-	}
 
 	/*
 	 * The section name table must be NUL-terminated, as required
 	 * by the spec. This makes strcmp and pr_* calls that access
 	 * strings in the section safe.
 	 */
-	info->secstrings = (void *)info->hdr + strhdr->sh_offset;
+	secstrings = (void *)info->hdr + strhdr->sh_offset;
 	if (strhdr->sh_size == 0) {
 		pr_err("empty section name table\n");
-		goto no_exec;
+		return -ENOEXEC;
 	}
-	if (info->secstrings[strhdr->sh_size - 1] != '\0') {
+	if (secstrings[strhdr->sh_size - 1] != '\0') {
 		pr_err("ELF Spec violation: section name table isn't null terminated\n");
-		goto no_exec;
-	}
-
-	/*
-	 * The code assumes that section 0 has a length of zero and
-	 * an addr of zero, so check for it.
-	 */
-	if (info->sechdrs[0].sh_type != SHT_NULL
-	    || info->sechdrs[0].sh_size != 0
-	    || info->sechdrs[0].sh_addr != 0) {
-		pr_err("ELF Spec violation: section 0 type(%d)!=SH_NULL or non-zero len or addr\n",
-		       info->sechdrs[0].sh_type);
-		goto no_exec;
+		return -ENOEXEC;
 	}
 
-	for (i = 1; i < info->hdr->e_shnum; i++) {
+	for (i = 0; i < info->hdr->e_shnum; i++) {
 		shdr = &info->sechdrs[i];
-		switch (shdr->sh_type) {
-		case SHT_NULL:
-		case SHT_NOBITS:
+		/* SHT_NULL means sh_name has an undefined value */
+		if (shdr->sh_type == SHT_NULL)
 			continue;
-		case SHT_SYMTAB:
-			if (shdr->sh_link == SHN_UNDEF
-			    || shdr->sh_link >= info->hdr->e_shnum) {
-				pr_err("Invalid ELF sh_link!=SHN_UNDEF(%d) or (sh_link(%d) >= hdr->e_shnum(%d)\n",
-				       shdr->sh_link, shdr->sh_link,
-				       info->hdr->e_shnum);
-				goto no_exec;
-			}
-			num_sym_secs++;
-			sym_idx = i;
-			fallthrough;
-		default:
-			err = validate_section_offset(info, shdr);
-			if (err < 0) {
-				pr_err("Invalid ELF section in module (section %u type %u)\n",
-					i, shdr->sh_type);
-				return err;
-			}
-			if (strcmp(info->secstrings + shdr->sh_name,
-				   ".gnu.linkonce.this_module") == 0) {
-				num_mod_secs++;
-				mod_idx = i;
-			} else if (strcmp(info->secstrings + shdr->sh_name,
-				   ".modinfo") == 0) {
-				num_info_secs++;
-				info_idx = i;
-			}
-
-			if (shdr->sh_flags & SHF_ALLOC) {
-				if (shdr->sh_name >= strhdr->sh_size) {
-					pr_err("Invalid ELF section name in module (section %u type %u)\n",
-					       i, shdr->sh_type);
-					return -ENOEXEC;
-				}
-			}
-			break;
+		if (shdr->sh_name >= strhdr->sh_size) {
+			pr_err("Invalid ELF section name in module (section %u type %u)\n",
+			       i, shdr->sh_type);
+			return -ENOEXEC;
 		}
 	}
 
-	if (num_info_secs > 1) {
+	info->secstrings = secstrings;
+	return 0;
+}
+
+/**
+ * elf_validity_cache_index_info() - Validate and cache modinfo section
+ * @info: Load info to populate the modinfo index on.
+ *        Must have &load_info->sechdrs and &load_info->secstrings populated
+ *
+ * Checks that if there is a .modinfo section, it is unique.
+ * Then, it caches its index in &load_info->index.info.
+ * Finally, it tries to populate the name to improve error messages.
+ *
+ * Return: %0 if valid, %-ENOEXEC if multiple modinfo sections were found.
+ */
+static int elf_validity_cache_index_info(struct load_info *info)
+{
+	int info_idx;
+
+	info_idx = find_any_unique_sec(info, ".modinfo");
+
+	if (info_idx == 0)
+		/* Early return, no .modinfo */
+		return 0;
+
+	if (info_idx < 0) {
 		pr_err("Only one .modinfo section must exist.\n");
-		goto no_exec;
-	} else if (num_info_secs == 1) {
-		/* Try to find a name early so we can log errors with a module name */
-		info->index.info = info_idx;
-		info->name = get_modinfo(info, "name");
+		return -ENOEXEC;
 	}
 
-	if (num_sym_secs != 1) {
-		pr_warn("%s: module has no symbols (stripped?)\n",
-			info->name ?: "(missing .modinfo section or name field)");
-		goto no_exec;
-	}
+	info->index.info = info_idx;
+	/* Try to find a name early so we can log errors with a module name */
+	info->name = get_modinfo(info, "name");
 
-	/* Sets internal symbols and strings. */
-	info->index.sym = sym_idx;
-	shdr = &info->sechdrs[sym_idx];
-	info->index.str = shdr->sh_link;
-	info->strtab = (char *)info->hdr + info->sechdrs[info->index.str].sh_offset;
+	return 0;
+}
 
-	/*
-	 * The ".gnu.linkonce.this_module" ELF section is special. It is
-	 * what modpost uses to refer to __this_module and let's use rely
-	 * on THIS_MODULE to point to &__this_module properly. The kernel's
-	 * modpost declares it on each modules's *.mod.c file. If the struct
-	 * module of the kernel changes a full kernel rebuild is required.
-	 *
-	 * We have a few expectaions for this special section, the following
-	 * code validates all this for us:
-	 *
-	 *   o Only one section must exist
-	 *   o We expect the kernel to always have to allocate it: SHF_ALLOC
-	 *   o The section size must match the kernel's run time's struct module
-	 *     size
-	 */
-	if (num_mod_secs != 1) {
-		pr_err("module %s: Only one .gnu.linkonce.this_module section must exist.\n",
+/**
+ * elf_validity_cache_index_mod() - Validates and caches this_module section
+ * @info: Load info to cache this_module on.
+ *        Must have &load_info->sechdrs and &load_info->secstrings populated
+ *
+ * The ".gnu.linkonce.this_module" ELF section is special. It is what modpost
+ * uses to refer to __this_module and let's use rely on THIS_MODULE to point
+ * to &__this_module properly. The kernel's modpost declares it on each
+ * modules's *.mod.c file. If the struct module of the kernel changes a full
+ * kernel rebuild is required.
+ *
+ * We have a few expectations for this special section, this function
+ * validates all this for us:
+ *
+ * * The section has contents
+ * * The section is unique
+ * * We expect the kernel to always have to allocate it: SHF_ALLOC
+ * * The section size must match the kernel's run time's struct module
+ *   size
+ *
+ * If all checks pass, the index will be cached in &load_info->index.mod
+ *
+ * Return: %0 on validation success, %-ENOEXEC on failure
+ */
+static int elf_validity_cache_index_mod(struct load_info *info)
+{
+	Elf_Shdr *shdr;
+	int mod_idx;
+
+	mod_idx = find_any_unique_sec(info, ".gnu.linkonce.this_module");
+	if (mod_idx <= 0) {
+		pr_err("module %s: Exactly one .gnu.linkonce.this_module section must exist.\n",
 		       info->name ?: "(missing .modinfo section or name field)");
-		goto no_exec;
+		return -ENOEXEC;
 	}
 
 	shdr = &info->sechdrs[mod_idx];
 
-	/*
-	 * This is already implied on the switch above, however let's be
-	 * pedantic about it.
-	 */
 	if (shdr->sh_type == SHT_NOBITS) {
 		pr_err("module %s: .gnu.linkonce.this_module section must have a size set\n",
 		       info->name ?: "(missing .modinfo section or name field)");
-		goto no_exec;
+		return -ENOEXEC;
 	}
 
 	if (!(shdr->sh_flags & SHF_ALLOC)) {
 		pr_err("module %s: .gnu.linkonce.this_module must occupy memory during process execution\n",
 		       info->name ?: "(missing .modinfo section or name field)");
-		goto no_exec;
+		return -ENOEXEC;
 	}
 
 	if (shdr->sh_size != sizeof(struct module)) {
 		pr_err("module %s: .gnu.linkonce.this_module section size must match the kernel's built struct module size at run time\n",
 		       info->name ?: "(missing .modinfo section or name field)");
-		goto no_exec;
+		return -ENOEXEC;
 	}
 
 	info->index.mod = mod_idx;
 
-	/* This is temporary: point mod into copy of data. */
-	info->mod = (void *)info->hdr + shdr->sh_offset;
+	return 0;
+}
 
-	/*
-	 * If we didn't load the .modinfo 'name' field earlier, fall back to
-	 * on-disk struct mod 'name' field.
-	 */
-	if (!info->name)
-		info->name = info->mod->name;
+/**
+ * elf_validity_cache_index_sym() - Validate and cache symtab index
+ * @info: Load info to cache symtab index in.
+ *        Must have &load_info->sechdrs and &load_info->secstrings populated.
+ *
+ * Checks that there is exactly one symbol table, then caches its index in
+ * &load_info->index.sym.
+ *
+ * Return: %0 if valid, %-ENOEXEC on failure.
+ */
+static int elf_validity_cache_index_sym(struct load_info *info)
+{
+	unsigned int sym_idx;
+	unsigned int num_sym_secs = 0;
+	int i;
+
+	for (i = 1; i < info->hdr->e_shnum; i++) {
+		if (info->sechdrs[i].sh_type == SHT_SYMTAB) {
+			num_sym_secs++;
+			sym_idx = i;
+		}
+	}
+
+	if (num_sym_secs != 1) {
+		pr_warn("%s: module has no symbols (stripped?)\n",
+			info->name ?: "(missing .modinfo section or name field)");
+		return -ENOEXEC;
+	}
+
+	info->index.sym = sym_idx;
+
+	return 0;
+}
+
+/**
+ * elf_validity_cache_index_str() - Validate and cache strtab index
+ * @info: Load info to cache strtab index in.
+ *        Must have &load_info->sechdrs and &load_info->secstrings populated.
+ *        Must have &load_info->index.sym populated.
+ *
+ * Looks at the symbol table's associated string table, makes sure it is
+ * in-bounds, and caches it.
+ *
+ * Return: %0 if valid, %-ENOEXEC on failure.
+ */
+static int elf_validity_cache_index_str(struct load_info *info)
+{
+	unsigned int str_idx = info->sechdrs[info->index.sym].sh_link;
+
+	if (str_idx == SHN_UNDEF || str_idx > info->hdr->e_shnum) {
+		pr_err("Invalid ELF sh_link!=SHN_UNDEF(%d) or (sh_link(%d) >= hdr->e_shnum(%d)\n",
+		       str_idx, str_idx, info->hdr->e_shnum);
+		return -ENOEXEC;
+	}
+
+	info->index.str = str_idx;
+	return 0;
+}
+
+/**
+ * elf_validity_cache_index() - Resolve, validate, cache section indices
+ * @info:  Load info to read from and update.
+ *         &load_info->sechdrs and &load_info->secstrings must be populated.
+ * @flags: Load flags, relevant to suppress version loading, see
+ *         uapi/linux/module.h
+ *
+ * Populates &load_info->index, validating as it goes.
+ * See child functions for per-field validation:
+ *
+ * * elf_validity_cache_index_info()
+ * * elf_validity_cache_index_mod()
+ * * elf_validity_cache_index_sym()
+ * * elf_validity_cache_index_str()
+ *
+ * If versioning is not suppressed via flags, load the version index from
+ * a section called "__versions" with no validation.
+ *
+ * If CONFIG_SMP is enabled, load the percpu section by name with no
+ * validation.
+ *
+ * Return: 0 on success, negative error code if an index failed validation.
+ */
+static int elf_validity_cache_index(struct load_info *info, int flags)
+{
+	int err;
+
+	err = elf_validity_cache_index_info(info);
+	if (err < 0)
+		return err;
+	err = elf_validity_cache_index_mod(info);
+	if (err < 0)
+		return err;
+	err = elf_validity_cache_index_sym(info);
+	if (err < 0)
+		return err;
+	err = elf_validity_cache_index_str(info);
+	if (err < 0)
+		return err;
 
 	if (flags & MODULE_INIT_IGNORE_MODVERSIONS)
 		info->index.vers = 0; /* Pretend no __versions section! */
@@ -1894,9 +2069,125 @@ static int elf_validity_cache_copy(struct load_info *info, int flags)
 	info->index.pcpu = find_pcpusec(info);
 
 	return 0;
+}
 
-no_exec:
-	return -ENOEXEC;
+/**
+ * elf_validity_cache_strtab() - Validate and cache symbol string table
+ * @info: Load info to read from and update.
+ *        Must have &load_info->sechdrs and &load_info->secstrings populated.
+ *        Must have &load_info->index populated.
+ *
+ * Checks:
+ *
+ * * The string table is not empty.
+ * * The string table starts and ends with NUL (required by ELF spec).
+ * * Every &Elf_Sym->st_name offset in the symbol table is inbounds of the
+ *   string table.
+ *
+ * And caches the pointer as &load_info->strtab in @info.
+ *
+ * Return: 0 on success, negative error code if a check failed.
+ */
+static int elf_validity_cache_strtab(struct load_info *info)
+{
+	Elf_Shdr *str_shdr = &info->sechdrs[info->index.str];
+	Elf_Shdr *sym_shdr = &info->sechdrs[info->index.sym];
+	char *strtab = (char *)info->hdr + str_shdr->sh_offset;
+	Elf_Sym *syms = (void *)info->hdr + sym_shdr->sh_offset;
+	int i;
+
+	if (str_shdr->sh_size == 0) {
+		pr_err("empty symbol string table\n");
+		return -ENOEXEC;
+	}
+	if (strtab[0] != '\0') {
+		pr_err("symbol string table missing leading NUL\n");
+		return -ENOEXEC;
+	}
+	if (strtab[str_shdr->sh_size - 1] != '\0') {
+		pr_err("symbol string table isn't NUL terminated\n");
+		return -ENOEXEC;
+	}
+
+	/*
+	 * Now that we know strtab is correctly structured, check symbol
+	 * starts are inbounds before they're used later.
+	 */
+	for (i = 0; i < sym_shdr->sh_size / sizeof(*syms); i++) {
+		if (syms[i].st_name >= str_shdr->sh_size) {
+			pr_err("symbol name out of bounds in string table");
+			return -ENOEXEC;
+		}
+	}
+
+	info->strtab = strtab;
+	return 0;
+}
+
+/**
+ * elf_validity_cache_copy() - Validate kernel module and cache section info
+ *
+ * @info:  Contains the ELF info, and is where we fill out the computed section
+ *         indexes and pointers.
+ * @flags: Module load flags, see uapi/linux/module.h
+ *
+ * This validates a userspace ELF module matches our expectations so that it
+ * will be safe to assume it is well formed later in the load.
+ *
+ * Notable checks include:
+ *
+ * * ELF header is within range and valid
+ * * Section headers are inbounds and the expected size
+ * * Section contents are inbounds
+ * * The section name string section is inbounds, NUL terminated, and
+ *   section name offsets are inbounds.
+ * * Symbol table, symbol string table, and this_module are present
+ * * Symbol table, this_module, and modinfo are unique
+ * * The symbol table string table is inbounds, NUL leading and terminated,
+ *   and the symbol name offsets into the table are inbounds.
+ *
+ * Computed and cached values include:
+ *
+ * * &load_info->sechdrs (section header table)
+ * * &load_info->secstrings (section name strings)
+ * * &load_info->index (distinguished section indices)
+ * * &load_info->strtab (string table for decoding section symbols)
+ *
+ * As a last step we set info->mod to the temporary copy of the module in
+ * info->hdr. The final one will be allocated in move_module(). Any
+ * modifications we make to our copy of the module will be carried over
+ * to the final minted module.
+ *
+ * Return: 0 on success, negative error code if the ELF failed validation.
+ */
+static int elf_validity_cache_copy(struct load_info *info, int flags)
+{
+	int err;
+
+	err = elf_validity_cache_sechdrs(info);
+	if (err < 0)
+		return err;
+	err = elf_validity_cache_secstrings(info);
+	if (err < 0)
+		return err;
+	err = elf_validity_cache_index(info, flags);
+	if (err < 0)
+		return err;
+	err = elf_validity_cache_strtab(info);
+	if (err < 0)
+		return err;
+
+	/* This is temporary: point mod into copy of data. */
+	info->mod = (void *)info->hdr + info->sechdrs[info->index.mod].sh_offset;
+
+	/*
+	 * If we didn't load the .modinfo 'name' field earlier, fall back to
+	 * on-disk struct mod 'name' field.
+	 */
+	if (!info->name)
+		info->name = info->mod->name;
+
+	return 0;
 }
 
 #define COPY_CHUNK_SIZE (16*PAGE_SIZE)
-- 
2.43.0.rc0.421.g78406f8d94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/5] modpost: Extended modversion support
  2023-11-18  2:54 [PATCH v2 0/5] MODVERSIONS + RUST Redux Matthew Maurer
  2023-11-18  2:54 ` [PATCH v2 1/5] export_report: Rehabilitate script Matthew Maurer
  2023-11-18  2:54 ` [PATCH v2 2/5] modules: Refactor + kdoc elf_validity_cached_copy Matthew Maurer
@ 2023-11-18  2:54 ` Matthew Maurer
  2023-11-18 13:42   ` kernel test robot
  2023-11-18  2:54 ` [PATCH v2 4/5] rust: Allow MODVERSIONS Matthew Maurer
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Matthew Maurer @ 2023-11-18  2:54 UTC (permalink / raw)
  To: Masahiro Yamada, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain
  Cc: Nathan Chancellor, Nicolas Schier, linuxppc-dev, linux-kernel,
	linux-modules, linux-kbuild, rust-for-linux, Laura Abbott,
	Matthew Maurer

Adds a new format for modversions which stores each field in a separate
ELF section. This initially adds support for variable length names, but
could later be used to add additional fields to modversions in a
backwards compatible way if needed.

Since PPC munges its version records to strip leading dots, we reproduce
the munging for the new format. Other architectures do not appear to
have architecture-specific usage of this information.

Signed-off-by: Matthew Maurer <mmaurer@google.com>
---
 arch/powerpc/kernel/module_64.c | 25 ++++++++-
 kernel/module/internal.h        | 11 ++++
 kernel/module/main.c            | 92 ++++++++++++++++++++++++++++++---
 kernel/module/version.c         | 43 +++++++++++++++
 scripts/mod/modpost.c           | 37 +++++++++++--
 5 files changed, 195 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 7112adc597a8..bff03627014c 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -22,6 +22,7 @@
 #include <asm/setup.h>
 #include <asm/sections.h>
 #include <asm/inst.h>
+#include <string.h>
 
 /* FIXME: We don't do .init separately.  To do this, we'd need to have
    a separate r2 value in the init and core section, and stub between
@@ -355,6 +356,24 @@ static void dedotify_versions(struct modversion_info *vers,
 		}
 }
 
+static void dedotify_ext_version_names(char *str_seq, unsigned long size)
+{
+	unsigned long out = 0;
+	unsigned long in;
+	char last = '\0';
+
+	for (in = 0; in < size; in++) {
+		if (last == '\0')
+			/* Skip all leading dots */
+			if (str_seq[in] == '.')
+				continue;
+		last = str_seq[in];
+		str_seq[out++] = last;
+	}
+	/* Zero the trailing portion of the names table for robustness */
+	bzero(&str_seq[out], size - out);
+}
+
 /*
  * Undefined symbols which refer to .funcname, hack to funcname. Make .TOC.
  * seem to be defined (value set later).
@@ -424,10 +443,12 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
 			me->arch.toc_section = i;
 			if (sechdrs[i].sh_addralign < 8)
 				sechdrs[i].sh_addralign = 8;
-		}
-		else if (strcmp(secstrings+sechdrs[i].sh_name,"__versions")==0)
+		} else if (strcmp(secstrings + sechdrs[i].sh_name, "__versions") == 0)
 			dedotify_versions((void *)hdr + sechdrs[i].sh_offset,
 					  sechdrs[i].sh_size);
+		else if (strcmp(secstrings + sechdrs[i].sh_name, "__version_ext_names") == 0)
+			dedotify_ext_version_names((void *)hdr + sechdrs[i].sh_offset,
+						   sechdrs[i].sh_size);
 
 		if (sechdrs[i].sh_type == SHT_SYMTAB)
 			dedotify((void *)hdr + sechdrs[i].sh_offset,
diff --git a/kernel/module/internal.h b/kernel/module/internal.h
index d8dc52eb9c82..35d89c3f657c 100644
--- a/kernel/module/internal.h
+++ b/kernel/module/internal.h
@@ -86,6 +86,8 @@ struct load_info {
 		unsigned int vers;
 		unsigned int info;
 		unsigned int pcpu;
+		unsigned int vers_ext_crc;
+		unsigned int vers_ext_name;
 	} index;
 };
 
@@ -389,6 +391,15 @@ void module_layout(struct module *mod, struct modversion_info *ver, struct kerne
 		   struct kernel_symbol *ks, struct tracepoint * const *tp);
 int check_modstruct_version(const struct load_info *info, struct module *mod);
 int same_magic(const char *amagic, const char *bmagic, bool has_crcs);
+struct modversion_info_ext {
+	size_t remaining;
+	const s32 *crc;
+	const char *name;
+};
+void modversion_ext_start(const struct load_info *info, struct modversion_info_ext *ver);
+void modversion_ext_advance(struct modversion_info_ext *ver);
+#define for_each_modversion_info_ext(ver, info) \
+	for (modversion_ext_start(info, &ver); ver.remaining > 0; modversion_ext_advance(&ver))
 #else /* !CONFIG_MODVERSIONS */
 static inline int check_version(const struct load_info *info,
 				const char *symname,
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 8b2848b3183a..2b175b7953e7 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2021,6 +2021,82 @@ static int elf_validity_cache_index_str(struct load_info *info)
 	return 0;
 }
 
+/**
+ * elf_validity_cache_index_versions() - Validate and cache version indices
+ * @info:  Load info to cache version indices in.
+ *         Must have &load_info->sechdrs and &load_info->secstrings populated.
+ * @flags: Load flags, relevant to suppress version loading, see
+ *         uapi/linux/module.h
+ *
+ * If we're ignoring modversions based on @flags, zero all version indices
+ * and return validity. Othewrise check:
+ *
+ * * If "__version_ext_crcs" is present, "__version_ext_names" is present
+ * * There is a name present for every crc
+ *
+ * Then populate:
+ *
+ * * &load_info->index.vers
+ * * &load_info->index.vers_ext_crc
+ * * &load_info->index.vers_ext_names
+ *
+ * if present.
+ *
+ * Return: %0 if valid, %-ENOEXEC on failure.
+ */
+static int elf_validity_cache_index_versions(struct load_info *info, int flags)
+{
+	unsigned int vers_ext_crc;
+	unsigned int vers_ext_name;
+	size_t crc_count;
+	size_t remaining_len;
+	size_t name_size;
+	char *name;
+
+	/* If modversions were suppressed, pretend we didn't find any */
+	if (flags & MODULE_INIT_IGNORE_MODVERSIONS) {
+		info->index.vers = 0;
+		info->index.vers_ext_crc = 0;
+		info->index.vers_ext_name = 0;
+		return 0;
+	}
+
+	vers_ext_crc = find_sec(info, "__version_ext_crcs");
+	vers_ext_name = find_sec(info, "__version_ext_names");
+
+	/* If we have one field, we must have the other */
+	if (!!vers_ext_crc != !!vers_ext_name) {
+		pr_err("extended version crc+name presence does not match");
+		return -ENOEXEC;
+	}
+
+	/*
+	 * If we have extended version information, we should have the same
+	 * number of entries in every section.
+	 */
+	if (vers_ext_crc) {
+		crc_count = info->sechdrs[vers_ext_crc].sh_size / sizeof(s32);
+		name = (void *)info->hdr +
+			info->sechdrs[vers_ext_name].sh_offset;
+		remaining_len = info->sechdrs[vers_ext_name].sh_size;
+
+		while (crc_count--) {
+			name_size = strnlen(name, remaining_len) + 1;
+			if (name_size > remaining_len) {
+				pr_err("more extended version crcs than names");
+				return -ENOEXEC;
+			}
+			remaining_len -= name_size;
+			name += name_size;
+		}
+	}
+
+	info->index.vers = find_sec(info, "__versions");
+	info->index.vers_ext_crc = vers_ext_crc;
+	info->index.vers_ext_name = vers_ext_name;
+	return 0;
+}
+
 /**
  * elf_validity_cache_index() - Resolve, validate, cache section indices
  * @info:  Load info to read from and update.
@@ -2035,9 +2111,7 @@ static int elf_validity_cache_index_str(struct load_info *info)
  * * elf_validity_cache_index_mod()
  * * elf_validity_cache_index_sym()
  * * elf_validity_cache_index_str()
- *
- * If versioning is not suppressed via flags, load the version index from
- * a section called "__versions" with no validation.
+ * * elf_validity_cache_index_versions()
  *
  * If CONFIG_SMP is enabled, load the percpu section by name with no
  * validation.
@@ -2060,11 +2134,9 @@ static int elf_validity_cache_index(struct load_info *info, int flags)
 	err = elf_validity_cache_index_str(info);
 	if (err < 0)
 		return err;
-
-	if (flags & MODULE_INIT_IGNORE_MODVERSIONS)
-		info->index.vers = 0; /* Pretend no __versions section! */
-	else
-		info->index.vers = find_sec(info, "__versions");
+	err = elf_validity_cache_index_versions(info, flags);
+	if (err < 0)
+		return err;
 
 	info->index.pcpu = find_pcpusec(info);
 
@@ -2291,6 +2363,10 @@ static int rewrite_section_headers(struct load_info *info, int flags)
 
 	/* Track but don't keep modinfo and version sections. */
 	info->sechdrs[info->index.vers].sh_flags &= ~(unsigned long)SHF_ALLOC;
+	info->sechdrs[info->index.vers_ext_crc].sh_flags &=
+		~(unsigned long)SHF_ALLOC;
+	info->sechdrs[info->index.vers_ext_name].sh_flags &=
+		~(unsigned long)SHF_ALLOC;
 	info->sechdrs[info->index.info].sh_flags &= ~(unsigned long)SHF_ALLOC;
 
 	return 0;
diff --git a/kernel/module/version.c b/kernel/module/version.c
index 53f43ac5a73e..02d8340bdb57 100644
--- a/kernel/module/version.c
+++ b/kernel/module/version.c
@@ -19,11 +19,28 @@ int check_version(const struct load_info *info,
 	unsigned int versindex = info->index.vers;
 	unsigned int i, num_versions;
 	struct modversion_info *versions;
+	struct modversion_info_ext version_ext;
 
 	/* Exporting module didn't supply crcs?  OK, we're already tainted. */
 	if (!crc)
 		return 1;
 
+	/* If we have extended version info, rely on it */
+	if (info->index.vers_ext_crc) {
+		for_each_modversion_info_ext(version_ext, info) {
+			if (strcmp(version_ext.name, symname) != 0)
+				continue;
+			if (*version_ext.crc == *crc)
+				return 1;
+			pr_debug("Found checksum %X vs module %X\n",
+				 *crc, *version_ext.crc);
+			goto bad_version;
+		}
+		pr_warn_once("%s: no extended symbol version for %s\n",
+			     info->name, symname);
+		return 1;
+	}
+
 	/* No versions at all?  modprobe --force does this. */
 	if (versindex == 0)
 		return try_to_force_load(mod, symname) == 0;
@@ -87,6 +104,32 @@ int same_magic(const char *amagic, const char *bmagic,
 	return strcmp(amagic, bmagic) == 0;
 }
 
+void modversion_ext_start(const struct load_info *info,
+			  struct modversion_info_ext *start)
+{
+	unsigned int crc_idx = info->index.vers_ext_crc;
+	unsigned int name_idx = info->index.vers_ext_name;
+	Elf_Shdr *sechdrs = info->sechdrs;
+
+	/*
+	 * Both of these fields are needed for this to be useful
+	 * Any future fields should be initialized to NULL if absent.
+	 */
+	if ((crc_idx == 0) || (name_idx == 0))
+		start->remaining = 0;
+
+	start->crc = (const s32 *)sechdrs[crc_idx].sh_addr;
+	start->name = (const char *)sechdrs[name_idx].sh_addr;
+	start->remaining = sechdrs[crc_idx].sh_size / sizeof(*start->crc);
+}
+
+void modversion_ext_advance(struct modversion_info_ext *vers)
+{
+	vers->remaining--;
+	vers->crc++;
+	vers->name += strlen(vers->name) + 1;
+}
+
 /*
  * Generate the signature for all relevant module structures here.
  * If these change, we don't want to try to parse the module.
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 973b5e5ae2dd..530890bf5733 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1910,15 +1910,46 @@ static void add_versions(struct buffer *b, struct module *mod)
 			continue;
 		}
 		if (strlen(s->name) >= MODULE_NAME_LEN) {
-			error("too long symbol \"%s\" [%s.ko]\n",
-			      s->name, mod->name);
-			break;
+			/* this symbol will only be in the extended info */
+			continue;
 		}
 		buf_printf(b, "\t{ %#8x, \"%s\" },\n",
 			   s->crc, s->name);
 	}
 
 	buf_printf(b, "};\n");
+
+	buf_printf(b, "static const s32 ____version_ext_crcs[]\n");
+	buf_printf(b, "__used __section(\"__version_ext_crcs\") = {\n");
+	list_for_each_entry(s, &mod->unresolved_symbols, list) {
+		if (!s->module)
+			continue;
+		if (!s->crc_valid) {
+			/*
+			 * We already warned on this when producing the legacy
+			 * modversions table.
+			 */
+			continue;
+		}
+		buf_printf(b, "\t%#8x,\n", s->crc);
+	}
+	buf_printf(b, "};\n");
+
+	buf_printf(b, "static const char ____version_ext_names[]\n");
+	buf_printf(b, "__used __section(\"__version_ext_names\") =\n");
+	list_for_each_entry(s, &mod->unresolved_symbols, list) {
+		if (!s->module)
+			continue;
+		if (!s->crc_valid) {
+			/*
+			 * We already warned on this when producing the legacy
+			 * modversions table.
+			 */
+			continue;
+		}
+		buf_printf(b, "\t\"%s\\0\"\n", s->name);
+	}
+	buf_printf(b, ";\n");
 }
 
 static void add_depends(struct buffer *b, struct module *mod)
-- 
2.43.0.rc0.421.g78406f8d94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/5] rust: Allow MODVERSIONS
  2023-11-18  2:54 [PATCH v2 0/5] MODVERSIONS + RUST Redux Matthew Maurer
                   ` (2 preceding siblings ...)
  2023-11-18  2:54 ` [PATCH v2 3/5] modpost: Extended modversion support Matthew Maurer
@ 2023-11-18  2:54 ` Matthew Maurer
  2023-11-18  2:54 ` [PATCH v2 5/5] export_report: Use new version info format Matthew Maurer
  2023-11-22 15:49 ` [PATCH v2 0/5] MODVERSIONS + RUST Redux Masahiro Yamada
  5 siblings, 0 replies; 17+ messages in thread
From: Matthew Maurer @ 2023-11-18  2:54 UTC (permalink / raw)
  To: Masahiro Yamada, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain
  Cc: Nathan Chancellor, Nicolas Schier, linuxppc-dev, linux-kernel,
	linux-modules, linux-kbuild, rust-for-linux, Laura Abbott,
	Matthew Maurer

With variable length symbol names from extended modversions, we can
enable MODVERSIONS alongside RUST due to support for its longer symbol
names.

Signed-off-by: Matthew Maurer <mmaurer@google.com>
---
 init/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 9ffb103fc927..6cac5b4db8f6 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1885,7 +1885,6 @@ config RUST
 	bool "Rust support"
 	depends on HAVE_RUST
 	depends on RUST_IS_AVAILABLE
-	depends on !MODVERSIONS
 	depends on !GCC_PLUGINS
 	depends on !RANDSTRUCT
 	depends on !DEBUG_INFO_BTF || PAHOLE_HAS_LANG_EXCLUDE
-- 
2.43.0.rc0.421.g78406f8d94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/5] export_report: Use new version info format
  2023-11-18  2:54 [PATCH v2 0/5] MODVERSIONS + RUST Redux Matthew Maurer
                   ` (3 preceding siblings ...)
  2023-11-18  2:54 ` [PATCH v2 4/5] rust: Allow MODVERSIONS Matthew Maurer
@ 2023-11-18  2:54 ` Matthew Maurer
  2023-11-22 15:49 ` [PATCH v2 0/5] MODVERSIONS + RUST Redux Masahiro Yamada
  5 siblings, 0 replies; 17+ messages in thread
From: Matthew Maurer @ 2023-11-18  2:54 UTC (permalink / raw)
  To: Masahiro Yamada, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain
  Cc: Nathan Chancellor, Nicolas Schier, linuxppc-dev, linux-kernel,
	linux-modules, linux-kbuild, rust-for-linux, Laura Abbott,
	Matthew Maurer

The new version info format has a superset of symbols in the old format.
Since this is a tool for in-tree modules, we don't need to parse the old
one with this tool any longer.

Signed-off-by: Matthew Maurer <mmaurer@google.com>
---
 scripts/export_report.pl | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/scripts/export_report.pl b/scripts/export_report.pl
index dcef915405f3..6a37df6f947f 100755
--- a/scripts/export_report.pl
+++ b/scripts/export_report.pl
@@ -114,31 +114,29 @@ foreach my $thismod (@allcfiles) {
 	}
 
 	my $state=0;
+	# State map:
+	# 0 - Looking for names
+	# 1 - Scanning names
+	# 2 - Done
 	while ( <$module> ) {
 		chomp;
 		if ($state == 0) {
-			$state = 1 if ($_ =~ /static const struct modversion_info/);
+			$state = 1 if ($_ =~ /__used __section\("__version_ext_names"\)/);
 			next;
 		}
 		if ($state == 1) {
-			$state = 2 if ($_ =~ /__used __section\("__versions"\)/);
-			next;
-		}
-		if ($state == 2) {
-			if ( $_ =~ /};/ ) {
-				$state = 3;
-				next;
-			}
-			if ( $_ !~ /0x[0-9a-f]+,/ ) {
+			if ( $_ =~ /;/ ) {
+				$state = 2;
 				next;
 			}
-			my $sym = (split /([,"])/,)[4];
+			$_ =~ /"(.*)\\0"/;
+			my $sym = $1;
 			my ($module, $value, $symbol, $gpl) = @{$SYMBOL{$sym}};
 			$SYMBOL{ $sym } =  [ $module, $value+1, $symbol, $gpl];
 			push(@{$MODULE{$thismod}} , $sym);
 		}
 	}
-	if ($state != 3) {
+	if ($state != 2) {
 		warn "WARNING:$thismod is not built with CONFIG_MODVERSIONS enabled\n";
 		$modversion_warnings++;
 	}
-- 
2.43.0.rc0.421.g78406f8d94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/5] export_report: Rehabilitate script
  2023-11-18  2:54 ` [PATCH v2 1/5] export_report: Rehabilitate script Matthew Maurer
@ 2023-11-18 11:35   ` Greg KH
  0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2023-11-18 11:35 UTC (permalink / raw)
  To: Matthew Maurer
  Cc: Masahiro Yamada, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain, Nathan Chancellor, Nicolas Schier, linuxppc-dev,
	linux-kernel, linux-modules, linux-kbuild, rust-for-linux,
	Laura Abbott

On Sat, Nov 18, 2023 at 02:54:42AM +0000, Matthew Maurer wrote:
> * modules.order has .o files when in a build dir, support this
> * .mod.c source layout has changed, update regexes to match
> * Add a stage 3, to be more robust against additional .mod.c content

When you have to list different things you do in a patch, that is a huge
hint that you need to break up your patch into smaller pieces.

Remember, each patch can only do one logical thing.  I know it feels
odd, but it makes it easier to review.

This patch, as-is, is nothing that I would be able to take, please make
it a series.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] modules: Refactor + kdoc elf_validity_cached_copy
  2023-11-18  2:54 ` [PATCH v2 2/5] modules: Refactor + kdoc elf_validity_cached_copy Matthew Maurer
@ 2023-11-18 11:36   ` Greg KH
  0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2023-11-18 11:36 UTC (permalink / raw)
  To: Matthew Maurer
  Cc: Masahiro Yamada, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain, Nathan Chancellor, Nicolas Schier, linuxppc-dev,
	linux-kernel, linux-modules, linux-kbuild, rust-for-linux,
	Laura Abbott

On Sat, Nov 18, 2023 at 02:54:43AM +0000, Matthew Maurer wrote:
> Functionality is almost identical, just structured for better
> documentation and readability. Changes:
> 
> * Section names are checked for *all* non-SHT_NULL sections, not just
>   SHF_ALLOC sections. We have code that accesses section names of
>   non-SHF_ALLOC sections (see find_any_sec for example)
> * The section name check occurs *before* strcmping on section names.
>   Previously, it was possible to use an out-of-bounds offset to strcmp
>   against ".modinfo" or ".gnu.linkonce.this_module"
> * strtab is checked for NUL lead+termination and nonzero size
> * The symbol table is swept to ensure offsets are inbounds of strtab
> 
> While some of these oversights would normally be worrying, all of the
> potentially unverified accesses occur after signature check, and only in
> response to a user who can load a kernel module.
> 
> Signed-off-by: Matthew Maurer <mmaurer@google.com>
> ---
>  kernel/module/internal.h |   7 +-
>  kernel/module/main.c     | 585 +++++++++++++++++++++++++++++----------
>  2 files changed, 444 insertions(+), 148 deletions(-)

Again, this needs to be broken into much smaller pieces before we can
even review it.  Would you want to review this?

thanks,

greg "think of the reviewers" k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/5] modpost: Extended modversion support
  2023-11-18  2:54 ` [PATCH v2 3/5] modpost: Extended modversion support Matthew Maurer
@ 2023-11-18 13:42   ` kernel test robot
  0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2023-11-18 13:42 UTC (permalink / raw)
  To: Matthew Maurer, Masahiro Yamada, Nick Desaulniers, Miguel Ojeda,
	Gary Guo, Luis Chamberlain
  Cc: oe-kbuild-all, Nathan Chancellor, Nicolas Schier, linuxppc-dev,
	linux-kernel, linux-modules, linux-kbuild, rust-for-linux,
	Laura Abbott, Matthew Maurer

Hi Matthew,

kernel test robot noticed the following build errors:

[auto build test ERROR on mcgrof/modules-next]
[also build test ERROR on powerpc/next powerpc/fixes masahiroy-kbuild/for-next masahiroy-kbuild/fixes linus/master v6.7-rc1 next-20231117]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Maurer/export_report-Rehabilitate-script/20231118-110040
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git modules-next
patch link:    https://lore.kernel.org/r/20231118025748.2778044-4-mmaurer%40google.com
patch subject: [PATCH v2 3/5] modpost: Extended modversion support
config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20231118/202311182118.zJqkg301-lkp@intel.com/config)
compiler: powerpc64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231118/202311182118.zJqkg301-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311182118.zJqkg301-lkp@intel.com/

All errors (new ones prefixed by >>):

>> arch/powerpc/kernel/module_64.c:25:10: fatal error: string.h: No such file or directory
      25 | #include <string.h>
         |          ^~~~~~~~~~
   compilation terminated.


vim +25 arch/powerpc/kernel/module_64.c

     8	
     9	#include <linux/module.h>
    10	#include <linux/elf.h>
    11	#include <linux/moduleloader.h>
    12	#include <linux/err.h>
    13	#include <linux/vmalloc.h>
    14	#include <linux/ftrace.h>
    15	#include <linux/bug.h>
    16	#include <linux/uaccess.h>
    17	#include <linux/kernel.h>
    18	#include <asm/module.h>
    19	#include <asm/firmware.h>
    20	#include <asm/code-patching.h>
    21	#include <linux/sort.h>
    22	#include <asm/setup.h>
    23	#include <asm/sections.h>
    24	#include <asm/inst.h>
  > 25	#include <string.h>
    26	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] MODVERSIONS + RUST Redux
  2023-11-18  2:54 [PATCH v2 0/5] MODVERSIONS + RUST Redux Matthew Maurer
                   ` (4 preceding siblings ...)
  2023-11-18  2:54 ` [PATCH v2 5/5] export_report: Use new version info format Matthew Maurer
@ 2023-11-22 15:49 ` Masahiro Yamada
  2023-11-22 21:04   ` Matthew Maurer
  5 siblings, 1 reply; 17+ messages in thread
From: Masahiro Yamada @ 2023-11-22 15:49 UTC (permalink / raw)
  To: Matthew Maurer
  Cc: Nick Desaulniers, Miguel Ojeda, Gary Guo, Luis Chamberlain,
	Nathan Chancellor, Nicolas Schier, linuxppc-dev, linux-kernel,
	linux-modules, linux-kbuild, rust-for-linux, Laura Abbott

On Sat, Nov 18, 2023 at 11:58 AM Matthew Maurer <mmaurer@google.com> wrote:
>
> The goal of this patch series is to allow MODVERSIONS and RUST to be
> enabled simultaneously. The primary issue with doing this at the moment
> is that Rust uses some extremely long symbol names - for those
> unfamiliar with Rust, it may be helpful to think of some of the mangled
> C++ names you may have seen in binaries in the past.
>
> Previously, Gary Guo attempted to accomplish this by modifying the
> existing modversion format [1] to support variable-length symbol names.
> This was unfortunately considered to be a potential userspace break
> because kmod tools inspect this kernel module metadata. Masahiro Yamada
> suggested [2] that this could instead be done with a section per-field.
> This gives us the ability to be more flexible with this format in the
> future, as a new field or additional information will be in a new
> section which userspace tools will not yet attempt to read.
>
> In the previous version of this patchset, Luis Chamberlain suggested [3]
> I move validation out of the version checking and into the elf validity
> checker, and also add kernel-docs over there. I found
> elf_validity_cached_copy to be fairly dense and difficult to directly
> describe, so I refactored it into easier to explain pieces. In the
> process, I found a few missing checks and added those as well. See
> [PATCH 2/5] for more details. If this is too much, I'm more than happy
> to drop this patch from the series in favor of just adding the
> kernel-doc to the original code, but figured I'd offer it up in case the
> added clarity and checks were valuable.
>
> [1] https://lore.kernel.org/lkml/20230111161155.1349375-1-gary@garyguo.net/
> [2] https://lore.kernel.org/lkml/CAK7LNATsuszFR7JB5ZkqVS1W=hWr9=E7bTf+MvgJ+NXT3aZNwg@mail.gmail.com/
> [3] https://lore.kernel.org/lkml/ZVZNh%2FPA5HiVRkeb@bombadil.infradead.org/





I want to know why this is useful.


To clarify my question, let me first explain
what the modversion is.



In C, a function callee and callers must agree
with the interface of the function.


This is usually done by having a function prototype
in a header file.


Say, we have a function declaration

    int foo(int x, int y);

in include/linux/foo.h


Then, the C file that defines foo() and all C files
that call it must include <linux/foo.h> so that
argument mismatch can be detected.




Same for EXPORT_SYMBOL; the symbol provider and consumers
must agree with the interface of exported symbols.

In the kernel, however, there is no promise for in-kernel ABI
compatibility across different kernel versions.
The kernel only promises the compatibility of the userspace interface.


To load modules, by principle, vmlinux and modules must have
the same version.

To slightly loosen the limitation, CONFIG_MODVERSIONS was
introduced; when it is enabled, you can load a module
as long as all the prototypes of exported symbols match.

To do this, we need to encode information about prototypes.


This is done by a tool called genksyms (scripts/genksyms/genksyms).



Say, we have this code:


int foo(int x, int y)
{
     // genksyms does not care about
     // the function body.
}
EXPORT_SYMBOL(foo);


Genksyms parses the code and computes a CRC value for 'foo'.
Genksyms is only interested in the function name and its prototype.

It sees

   int foo(int, int)

and it transforms it into a CRC.


Any change to the prototype results in a
different CRC, so the module subsystem
can check the interface compatibility
before loading a module.


It is obvious that this is impossible for Rust source
because scripts/genksyms/genksyms is only able to
parse C code.


Then, what is happening here?

See rust/exports.c


  #define EXPORT_SYMBOL_RUST_GPL(sym) extern int sym; EXPORT_SYMBOL_GPL(sym)


The global scope symbols in Rust (i.e. 'pub) are automatically
exported, and all of them are visible as 'int' variables
from C world.


Genksyms will see this code:

  extern int foo;
  EXPORT_SYMBOL_GPL(foo);

Of course, this is not a true prototype.
The real signature on the Rust side might be:

  fn foo(x: i32, y: i32) -> i32


So, even if you enable CONFIG_MODVERSIONS,
nothing is checked for Rust.
Genksyms computes a CRC from "int foo", and
the module subsystem confirms it is a "int"
variable.

We know this check always succeeds.

Why is this useful?






> Matthew Maurer (5):
>   export_report: Rehabilitate script
>   modules: Refactor + kdoc elf_validity_cached_copy
>   modpost: Extended modversion support
>   rust: Allow MODVERSIONS
>   export_report: Use new version info format
>
>  arch/powerpc/kernel/module_64.c |  25 +-
>  init/Kconfig                    |   1 -
>  kernel/module/internal.h        |  18 +-
>  kernel/module/main.c            | 663 +++++++++++++++++++++++++-------
>  kernel/module/version.c         |  43 +++
>  scripts/export_report.pl        |  17 +-
>  scripts/mod/modpost.c           |  37 +-
>  7 files changed, 642 insertions(+), 162 deletions(-)
>
> --
> 2.43.0.rc0.421.g78406f8d94-goog
>


--
Best Regards





Masahiro Yamada

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] MODVERSIONS + RUST Redux
  2023-11-22 15:49 ` [PATCH v2 0/5] MODVERSIONS + RUST Redux Masahiro Yamada
@ 2023-11-22 21:04   ` Matthew Maurer
  2023-11-23  9:05     ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Maurer @ 2023-11-22 21:04 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Nick Desaulniers, Miguel Ojeda, Gary Guo, Luis Chamberlain,
	Nathan Chancellor, Nicolas Schier, linuxppc-dev, linux-kernel,
	linux-modules, linux-kbuild, rust-for-linux, Laura Abbott

> So, even if you enable CONFIG_MODVERSIONS,
> nothing is checked for Rust.
> Genksyms computes a CRC from "int foo", and
> the module subsystem confirms it is a "int"
> variable.
>
> We know this check always succeeds.
>
> Why is this useful?
The reason this is immediately useful is that it allows us to have Rust
in use with a kernel where C modules are able to benefit from MODVERSIONS
checking. The check would effectively be a no-op for now, as you have correctly
determined, but we could refine it to make it more restrictive later.
Since the
existing C approach errs on the side of "it could work" rather than "it will
work", I thought being more permissive was the correct initial solution.

If we want to err on the other side (modversions passes, so we're pretty sure
it will work), I could add to the last patch support for using .rmeta files as
the CRC source for Rust symbols. This would essentially say that the interface
for the entire compilation unit has to stay the same rather than just that one
function. We could potentially loosen this requirement in the future.

With regards to future directions that likely won't work for loosening it:
Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
teach genksyms to open it up and split out the pieces for specific functions.
Extending genksyms to parse Rust would also not solve the situation -
layouts are allowed to differ across compiler versions or even (in rare
cases) seemingly unrelated code changes.

Future directions that might work for loosening it:
* Generating crcs from debuginfo + compiler + flags
* Adding a feature to the rust compiler to dump this information. This
is likely to
  get pushback because Rust's current stance is that there is no ability to load
  object code built against a different library.

Would setting up Rust symbols so that they have a crc built out of .rmeta be
sufficient for you to consider this useful? If not, can you help me understand
what level of precision would be required?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] MODVERSIONS + RUST Redux
  2023-11-22 21:04   ` Matthew Maurer
@ 2023-11-23  9:05     ` Greg KH
  2023-11-23 11:38       ` Masahiro Yamada
  0 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2023-11-23  9:05 UTC (permalink / raw)
  To: Matthew Maurer
  Cc: Masahiro Yamada, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain, Nathan Chancellor, Nicolas Schier, linuxppc-dev,
	linux-kernel, linux-modules, linux-kbuild, rust-for-linux,
	Laura Abbott

On Wed, Nov 22, 2023 at 01:04:09PM -0800, Matthew Maurer wrote:
> > So, even if you enable CONFIG_MODVERSIONS,
> > nothing is checked for Rust.
> > Genksyms computes a CRC from "int foo", and
> > the module subsystem confirms it is a "int"
> > variable.
> >
> > We know this check always succeeds.
> >
> > Why is this useful?
> The reason this is immediately useful is that it allows us to have Rust
> in use with a kernel where C modules are able to benefit from MODVERSIONS
> checking. The check would effectively be a no-op for now, as you have correctly
> determined, but we could refine it to make it more restrictive later.
> Since the
> existing C approach errs on the side of "it could work" rather than "it will
> work", I thought being more permissive was the correct initial solution.

But it's just providing "fake" information to the CRC checker, which
means that the guarantee of a ABI check is not true at all.

So the ask for the user of "ensure that the ABI checking is correct" is
being circumvented here, and any change in the rust side can not be
detected at all.

The kernel is a "whole", either an option works for it, or it doesn't,
and you are splitting that guarantee here by saying "modversions will
only work for a portion of the kernel, not the whole thing" which is
going to cause problems for when people expect it to actually work
properly.

So, I'd strongly recommend fixing this for the rust code if you wish to
allow modversions to be enabled at all.

> With regards to future directions that likely won't work for loosening it:
> Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
> teach genksyms to open it up and split out the pieces for specific functions.
> Extending genksyms to parse Rust would also not solve the situation -
> layouts are allowed to differ across compiler versions or even (in rare
> cases) seemingly unrelated code changes.

What do you mean by "layout" here?  Yes, the crcs can be different
across compiler versions and seemingly unrelated code changes (genksyms
is VERY fragile) but that's ok, that's not what you are checking here.
You want to know if the rust function signature changes or not from the
last time you built the code, with the same compiler and options, that's
all you are verifying.

> Future directions that might work for loosening it:
> * Generating crcs from debuginfo + compiler + flags
> * Adding a feature to the rust compiler to dump this information. This
> is likely to
>   get pushback because Rust's current stance is that there is no ability to load
>   object code built against a different library.

Why not parse the function signature like we do for C?

> Would setting up Rust symbols so that they have a crc built out of .rmeta be
> sufficient for you to consider this useful? If not, can you help me understand
> what level of precision would be required?

What exactly does .rmeta have to do with the function signature?  That's
all you care about here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] MODVERSIONS + RUST Redux
  2023-11-23  9:05     ` Greg KH
@ 2023-11-23 11:38       ` Masahiro Yamada
  2023-11-23 12:12         ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Masahiro Yamada @ 2023-11-23 11:38 UTC (permalink / raw)
  To: Greg KH
  Cc: Matthew Maurer, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain, Nathan Chancellor, Nicolas Schier, linuxppc-dev,
	linux-kernel, linux-modules, linux-kbuild, rust-for-linux,
	Laura Abbott

On Thu, Nov 23, 2023 at 6:05 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Wed, Nov 22, 2023 at 01:04:09PM -0800, Matthew Maurer wrote:
> > > So, even if you enable CONFIG_MODVERSIONS,
> > > nothing is checked for Rust.
> > > Genksyms computes a CRC from "int foo", and
> > > the module subsystem confirms it is a "int"
> > > variable.
> > >
> > > We know this check always succeeds.
> > >
> > > Why is this useful?
> > The reason this is immediately useful is that it allows us to have Rust
> > in use with a kernel where C modules are able to benefit from MODVERSIONS
> > checking. The check would effectively be a no-op for now, as you have correctly
> > determined, but we could refine it to make it more restrictive later.
> > Since the
> > existing C approach errs on the side of "it could work" rather than "it will
> > work", I thought being more permissive was the correct initial solution.
>
> But it's just providing "fake" information to the CRC checker, which
> means that the guarantee of a ABI check is not true at all.
>
> So the ask for the user of "ensure that the ABI checking is correct" is
> being circumvented here, and any change in the rust side can not be
> detected at all.
>
> The kernel is a "whole", either an option works for it, or it doesn't,
> and you are splitting that guarantee here by saying "modversions will
> only work for a portion of the kernel, not the whole thing" which is
> going to cause problems for when people expect it to actually work
> properly.
>
> So, I'd strongly recommend fixing this for the rust code if you wish to
> allow modversions to be enabled at all.
>
> > With regards to future directions that likely won't work for loosening it:
> > Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
> > teach genksyms to open it up and split out the pieces for specific functions.
> > Extending genksyms to parse Rust would also not solve the situation -
> > layouts are allowed to differ across compiler versions or even (in rare
> > cases) seemingly unrelated code changes.
>
> What do you mean by "layout" here?  Yes, the crcs can be different
> across compiler versions and seemingly unrelated code changes (genksyms
> is VERY fragile) but that's ok, that's not what you are checking here.
> You want to know if the rust function signature changes or not from the
> last time you built the code, with the same compiler and options, that's
> all you are verifying.
>
> > Future directions that might work for loosening it:
> > * Generating crcs from debuginfo + compiler + flags
> > * Adding a feature to the rust compiler to dump this information. This
> > is likely to
> >   get pushback because Rust's current stance is that there is no ability to load
> >   object code built against a different library.
>
> Why not parse the function signature like we do for C?
>
> > Would setting up Rust symbols so that they have a crc built out of .rmeta be
> > sufficient for you to consider this useful? If not, can you help me understand
> > what level of precision would be required?
>
> What exactly does .rmeta have to do with the function signature?  That's
> all you care about here.




rmeta is generated per crate.

CRC is computed per symbol.

They have different granularity.
It is weird to refuse a module for incompatibility
of a symbol that it is not using at all.




> thanks,
>
> greg k-h




--
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] MODVERSIONS + RUST Redux
  2023-11-23 11:38       ` Masahiro Yamada
@ 2023-11-23 12:12         ` Greg KH
  2023-11-27 19:27           ` Matthew Maurer
  0 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2023-11-23 12:12 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Matthew Maurer, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain, Nathan Chancellor, Nicolas Schier, linuxppc-dev,
	linux-kernel, linux-modules, linux-kbuild, rust-for-linux,
	Laura Abbott

On Thu, Nov 23, 2023 at 08:38:45PM +0900, Masahiro Yamada wrote:
> On Thu, Nov 23, 2023 at 6:05 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Nov 22, 2023 at 01:04:09PM -0800, Matthew Maurer wrote:
> > > > So, even if you enable CONFIG_MODVERSIONS,
> > > > nothing is checked for Rust.
> > > > Genksyms computes a CRC from "int foo", and
> > > > the module subsystem confirms it is a "int"
> > > > variable.
> > > >
> > > > We know this check always succeeds.
> > > >
> > > > Why is this useful?
> > > The reason this is immediately useful is that it allows us to have Rust
> > > in use with a kernel where C modules are able to benefit from MODVERSIONS
> > > checking. The check would effectively be a no-op for now, as you have correctly
> > > determined, but we could refine it to make it more restrictive later.
> > > Since the
> > > existing C approach errs on the side of "it could work" rather than "it will
> > > work", I thought being more permissive was the correct initial solution.
> >
> > But it's just providing "fake" information to the CRC checker, which
> > means that the guarantee of a ABI check is not true at all.
> >
> > So the ask for the user of "ensure that the ABI checking is correct" is
> > being circumvented here, and any change in the rust side can not be
> > detected at all.
> >
> > The kernel is a "whole", either an option works for it, or it doesn't,
> > and you are splitting that guarantee here by saying "modversions will
> > only work for a portion of the kernel, not the whole thing" which is
> > going to cause problems for when people expect it to actually work
> > properly.
> >
> > So, I'd strongly recommend fixing this for the rust code if you wish to
> > allow modversions to be enabled at all.
> >
> > > With regards to future directions that likely won't work for loosening it:
> > > Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
> > > teach genksyms to open it up and split out the pieces for specific functions.
> > > Extending genksyms to parse Rust would also not solve the situation -
> > > layouts are allowed to differ across compiler versions or even (in rare
> > > cases) seemingly unrelated code changes.
> >
> > What do you mean by "layout" here?  Yes, the crcs can be different
> > across compiler versions and seemingly unrelated code changes (genksyms
> > is VERY fragile) but that's ok, that's not what you are checking here.
> > You want to know if the rust function signature changes or not from the
> > last time you built the code, with the same compiler and options, that's
> > all you are verifying.
> >
> > > Future directions that might work for loosening it:
> > > * Generating crcs from debuginfo + compiler + flags
> > > * Adding a feature to the rust compiler to dump this information. This
> > > is likely to
> > >   get pushback because Rust's current stance is that there is no ability to load
> > >   object code built against a different library.
> >
> > Why not parse the function signature like we do for C?
> >
> > > Would setting up Rust symbols so that they have a crc built out of .rmeta be
> > > sufficient for you to consider this useful? If not, can you help me understand
> > > what level of precision would be required?
> >
> > What exactly does .rmeta have to do with the function signature?  That's
> > all you care about here.
> 
> 
> 
> 
> rmeta is generated per crate.
> 
> CRC is computed per symbol.
> 
> They have different granularity.
> It is weird to refuse a module for incompatibility
> of a symbol that it is not using at all.

I agree, this should be on a per-symbol basis, so the Rust
infrastructure in the kernel needs to be fixed up to support this
properly, not just ignored like this patchset does.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] MODVERSIONS + RUST Redux
  2023-11-23 12:12         ` Greg KH
@ 2023-11-27 19:27           ` Matthew Maurer
  2023-11-28  8:05             ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Maurer @ 2023-11-27 19:27 UTC (permalink / raw)
  To: Greg KH
  Cc: Masahiro Yamada, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain, Nathan Chancellor, Nicolas Schier, linuxppc-dev,
	linux-kernel, linux-modules, linux-kbuild, rust-for-linux,
	Laura Abbott

> > >
> > > > With regards to future directions that likely won't work for loosening it:
> > > > Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
> > > > teach genksyms to open it up and split out the pieces for specific functions.
> > > > Extending genksyms to parse Rust would also not solve the situation -
> > > > layouts are allowed to differ across compiler versions or even (in rare
> > > > cases) seemingly unrelated code changes.
> > >
> > > What do you mean by "layout" here?  Yes, the crcs can be different
> > > across compiler versions and seemingly unrelated code changes (genksyms
> > > is VERY fragile) but that's ok, that's not what you are checking here.
> > > You want to know if the rust function signature changes or not from the
> > > last time you built the code, with the same compiler and options, that's
> > > all you are verifying.
What I mean by layout here is that if you write in Rust:
struct Foo {
  x: i32,
  y: i32,
}
it is not guaranteed to have the same layout across different compilations, even
within the same compiler. See
https://doc.rust-lang.org/reference/type-layout.html#the-rust-representation
Specifically, the compiler is allowed to arbitrarily insert padding,
reorder fields, etc.
on the same code as long as the overall alignment of the struct and individual
alignment of the fields remains correct and non-overlapping.

This means the compiler is *explicitly* allowed to, for example, permute x and y
as an optimization. In the above example this is unlikely, but if you
instead consider
struct Bar {
  x: i8,
  y: i64,
  z: i8,
}
It's easy to see why the compiler might decide to structure this as
y,x,z to reduce the
size of the struct. Those optimization decisions may be affected by
any other part of
the code, PGO, etc.
> > >
> > > > Future directions that might work for loosening it:
> > > > * Generating crcs from debuginfo + compiler + flags
> > > > * Adding a feature to the rust compiler to dump this information. This
> > > > is likely to
> > > >   get pushback because Rust's current stance is that there is no ability to load
> > > >   object code built against a different library.
> > >
> > > Why not parse the function signature like we do for C?
Because the function signature is insufficient to check the ABI, see above.
> > >
> > > > Would setting up Rust symbols so that they have a crc built out of .rmeta be
> > > > sufficient for you to consider this useful? If not, can you help me understand
> > > > what level of precision would be required?
> > >
> > > What exactly does .rmeta have to do with the function signature?  That's
> > > all you care about here.
The .rmeta file contains the decisions the compiler made about layout
in the crate
you're interfacing with. For example, the choice to encode Bar
with a yxz field order would be written into the .rmeta file.
> >
> >
> >
> >
> > rmeta is generated per crate.
> >
> > CRC is computed per symbol.
> >
> > They have different granularity.
> > It is weird to refuse a module for incompatibility
> > of a symbol that it is not using at all.
>
> I agree, this should be on a per-symbol basis, so the Rust
> infrastructure in the kernel needs to be fixed up to support this
> properly, not just ignored like this patchset does.
I agree there is a divergence here, I tried to point it out so that it
wouldn't be
a surprise later. The .rmeta file itself (which is the only way we
could know that
the ABI actually matches, because layout decisions are in there) is an unstable
format, which is why I would be reluctant to try to parse it and find only the
relevant portions to hash. This isn't just a "technically unstable"
format, but one
in which the compiler essentially just serializes out relevant internal data
structures, so any parser for it will involve significant alterations
on compiler
updates, which doesn't seem like a good plan.
>
> thanks,
>
> greg k-h
Given the above additional information, would you be interested in a patchset
which either:

A. Computes the CRC off the Rust type signature, knowing the compiler is
allowed to change the ABI based on information not contained in the CRC.
B. Uses the CRC of the .rmeta file, knowing, as was pointed out, that this
effectively contains the ABI of every symbol in the compilation unit, as well
as inline functions and polymorphic functions.

If neither of these works, we likely can't turn on MODVERSIONS+RUST until
further work is done upstream in the compiler to export some of this data in
an at least semi-stable fashion.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] MODVERSIONS + RUST Redux
  2023-11-27 19:27           ` Matthew Maurer
@ 2023-11-28  8:05             ` Greg KH
  2023-11-28  8:44               ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2023-11-28  8:05 UTC (permalink / raw)
  To: Matthew Maurer
  Cc: Masahiro Yamada, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain, Nathan Chancellor, Nicolas Schier, linuxppc-dev,
	linux-kernel, linux-modules, linux-kbuild, rust-for-linux,
	Laura Abbott

On Mon, Nov 27, 2023 at 11:27:07AM -0800, Matthew Maurer wrote:
> > > >
> > > > > With regards to future directions that likely won't work for loosening it:
> > > > > Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
> > > > > teach genksyms to open it up and split out the pieces for specific functions.
> > > > > Extending genksyms to parse Rust would also not solve the situation -
> > > > > layouts are allowed to differ across compiler versions or even (in rare
> > > > > cases) seemingly unrelated code changes.
> > > >
> > > > What do you mean by "layout" here?  Yes, the crcs can be different
> > > > across compiler versions and seemingly unrelated code changes (genksyms
> > > > is VERY fragile) but that's ok, that's not what you are checking here.
> > > > You want to know if the rust function signature changes or not from the
> > > > last time you built the code, with the same compiler and options, that's
> > > > all you are verifying.
> What I mean by layout here is that if you write in Rust:
> struct Foo {
>   x: i32,
>   y: i32,
> }
> it is not guaranteed to have the same layout across different compilations, even
> within the same compiler. See
> https://doc.rust-lang.org/reference/type-layout.html#the-rust-representation

Then you are going to have big problems, sorry.

> Specifically, the compiler is allowed to arbitrarily insert padding,
> reorder fields, etc.
> on the same code as long as the overall alignment of the struct and individual
> alignment of the fields remains correct and non-overlapping.
> 
> This means the compiler is *explicitly* allowed to, for example, permute x and y
> as an optimization. In the above example this is unlikely, but if you
> instead consider
> struct Bar {
>   x: i8,
>   y: i64,
>   z: i8,
> }
> It's easy to see why the compiler might decide to structure this as
> y,x,z to reduce the
> size of the struct. Those optimization decisions may be affected by
> any other part of
> the code, PGO, etc.

Then you all need to figure out some way to determine how the compiler
layed out the structure after it compiled/optimized it and be able to
compare it to previous builds (or just generate a crc based on the
layout it chose.)

> > > > > Future directions that might work for loosening it:
> > > > > * Generating crcs from debuginfo + compiler + flags
> > > > > * Adding a feature to the rust compiler to dump this information. This
> > > > > is likely to
> > > > >   get pushback because Rust's current stance is that there is no ability to load
> > > > >   object code built against a different library.
> > > >
> > > > Why not parse the function signature like we do for C?
> Because the function signature is insufficient to check the ABI, see above.
> > > >
> > > > > Would setting up Rust symbols so that they have a crc built out of .rmeta be
> > > > > sufficient for you to consider this useful? If not, can you help me understand
> > > > > what level of precision would be required?
> > > >
> > > > What exactly does .rmeta have to do with the function signature?  That's
> > > > all you care about here.
> The .rmeta file contains the decisions the compiler made about layout
> in the crate
> you're interfacing with. For example, the choice to encode Bar
> with a yxz field order would be written into the .rmeta file.

Ok, then yes, can you parse the .rmeta file to get that information?

> > > rmeta is generated per crate.
> > >
> > > CRC is computed per symbol.
> > >
> > > They have different granularity.
> > > It is weird to refuse a module for incompatibility
> > > of a symbol that it is not using at all.
> >
> > I agree, this should be on a per-symbol basis, so the Rust
> > infrastructure in the kernel needs to be fixed up to support this
> > properly, not just ignored like this patchset does.
> I agree there is a divergence here, I tried to point it out so that it
> wouldn't be
> a surprise later. The .rmeta file itself (which is the only way we
> could know that
> the ABI actually matches, because layout decisions are in there) is an unstable
> format, which is why I would be reluctant to try to parse it and find only the
> relevant portions to hash. This isn't just a "technically unstable"
> format, but one
> in which the compiler essentially just serializes out relevant internal data
> structures, so any parser for it will involve significant alterations
> on compiler
> updates, which doesn't seem like a good plan.
> >
> > thanks,
> >
> > greg k-h
> Given the above additional information, would you be interested in a patchset
> which either:
> 
> A. Computes the CRC off the Rust type signature, knowing the compiler is
> allowed to change the ABI based on information not contained in the CRC.

No.

> B. Uses the CRC of the .rmeta file, knowing, as was pointed out, that this
> effectively contains the ABI of every symbol in the compilation unit, as well
> as inline functions and polymorphic functions.

No.

> If neither of these works, we likely can't turn on MODVERSIONS+RUST until
> further work is done upstream in the compiler to export some of this data in
> an at least semi-stable fashion.

Looks like you need something a bit more fine-grained, as pointed out
above.  why not parse the structure/function information in the .rmeta
file?  Is the format of that file not stable?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] MODVERSIONS + RUST Redux
  2023-11-28  8:05             ` Greg KH
@ 2023-11-28  8:44               ` Greg KH
  0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2023-11-28  8:44 UTC (permalink / raw)
  To: Matthew Maurer
  Cc: Masahiro Yamada, Nick Desaulniers, Miguel Ojeda, Gary Guo,
	Luis Chamberlain, Nathan Chancellor, Nicolas Schier, linuxppc-dev,
	linux-kernel, linux-modules, linux-kbuild, rust-for-linux,
	Laura Abbott

On Tue, Nov 28, 2023 at 08:05:26AM +0000, Greg KH wrote:
> On Mon, Nov 27, 2023 at 11:27:07AM -0800, Matthew Maurer wrote:
> > > > >
> > > > > > With regards to future directions that likely won't work for loosening it:
> > > > > > Unfortunately, the .rmeta format itself is not stable, so I wouldn't want to
> > > > > > teach genksyms to open it up and split out the pieces for specific functions.
> > > > > > Extending genksyms to parse Rust would also not solve the situation -
> > > > > > layouts are allowed to differ across compiler versions or even (in rare
> > > > > > cases) seemingly unrelated code changes.
> > > > >
> > > > > What do you mean by "layout" here?  Yes, the crcs can be different
> > > > > across compiler versions and seemingly unrelated code changes (genksyms
> > > > > is VERY fragile) but that's ok, that's not what you are checking here.
> > > > > You want to know if the rust function signature changes or not from the
> > > > > last time you built the code, with the same compiler and options, that's
> > > > > all you are verifying.
> > What I mean by layout here is that if you write in Rust:
> > struct Foo {
> >   x: i32,
> >   y: i32,
> > }
> > it is not guaranteed to have the same layout across different compilations, even
> > within the same compiler. See
> > https://doc.rust-lang.org/reference/type-layout.html#the-rust-representation
> 
> Then you are going to have big problems, sorry.
> 
> > Specifically, the compiler is allowed to arbitrarily insert padding,
> > reorder fields, etc.
> > on the same code as long as the overall alignment of the struct and individual
> > alignment of the fields remains correct and non-overlapping.
> > 
> > This means the compiler is *explicitly* allowed to, for example, permute x and y
> > as an optimization. In the above example this is unlikely, but if you
> > instead consider
> > struct Bar {
> >   x: i8,
> >   y: i64,
> >   z: i8,
> > }
> > It's easy to see why the compiler might decide to structure this as
> > y,x,z to reduce the
> > size of the struct. Those optimization decisions may be affected by
> > any other part of
> > the code, PGO, etc.
> 
> Then you all need to figure out some way to determine how the compiler
> layed out the structure after it compiled/optimized it and be able to
> compare it to previous builds (or just generate a crc based on the
> layout it chose.)
> 
> > > > > > Future directions that might work for loosening it:
> > > > > > * Generating crcs from debuginfo + compiler + flags
> > > > > > * Adding a feature to the rust compiler to dump this information. This
> > > > > > is likely to
> > > > > >   get pushback because Rust's current stance is that there is no ability to load
> > > > > >   object code built against a different library.
> > > > >
> > > > > Why not parse the function signature like we do for C?
> > Because the function signature is insufficient to check the ABI, see above.
> > > > >
> > > > > > Would setting up Rust symbols so that they have a crc built out of .rmeta be
> > > > > > sufficient for you to consider this useful? If not, can you help me understand
> > > > > > what level of precision would be required?
> > > > >
> > > > > What exactly does .rmeta have to do with the function signature?  That's
> > > > > all you care about here.
> > The .rmeta file contains the decisions the compiler made about layout
> > in the crate
> > you're interfacing with. For example, the choice to encode Bar
> > with a yxz field order would be written into the .rmeta file.
> 
> Ok, then yes, can you parse the .rmeta file to get that information?
> 
> > > > rmeta is generated per crate.
> > > >
> > > > CRC is computed per symbol.
> > > >
> > > > They have different granularity.
> > > > It is weird to refuse a module for incompatibility
> > > > of a symbol that it is not using at all.
> > >
> > > I agree, this should be on a per-symbol basis, so the Rust
> > > infrastructure in the kernel needs to be fixed up to support this
> > > properly, not just ignored like this patchset does.
> > I agree there is a divergence here, I tried to point it out so that it
> > wouldn't be
> > a surprise later. The .rmeta file itself (which is the only way we
> > could know that
> > the ABI actually matches, because layout decisions are in there) is an unstable
> > format, which is why I would be reluctant to try to parse it and find only the
> > relevant portions to hash. This isn't just a "technically unstable"
> > format, but one
> > in which the compiler essentially just serializes out relevant internal data
> > structures, so any parser for it will involve significant alterations
> > on compiler
> > updates, which doesn't seem like a good plan.
> > >
> > > thanks,
> > >
> > > greg k-h
> > Given the above additional information, would you be interested in a patchset
> > which either:
> > 
> > A. Computes the CRC off the Rust type signature, knowing the compiler is
> > allowed to change the ABI based on information not contained in the CRC.
> 
> No.
> 
> > B. Uses the CRC of the .rmeta file, knowing, as was pointed out, that this
> > effectively contains the ABI of every symbol in the compilation unit, as well
> > as inline functions and polymorphic functions.
> 
> No.
> 
> > If neither of these works, we likely can't turn on MODVERSIONS+RUST until
> > further work is done upstream in the compiler to export some of this data in
> > an at least semi-stable fashion.
> 
> Looks like you need something a bit more fine-grained, as pointed out
> above.  why not parse the structure/function information in the .rmeta
> file?  Is the format of that file not stable?

Or, step back and figure something else out that can detect the
structure and function signatures of rust code and determine a way to
notice when they change.  That's the goal here, you need to notice when
the code changes, perhaps just use libabigail as that will work on the
dwarf output?

Note, libabigail will miss things that the crc checker does not miss
(and the opposite is true as well) which is why some groups (i.e.
Android) use both on their kernel to ensure that nothing slips by.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-11-28  8:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-18  2:54 [PATCH v2 0/5] MODVERSIONS + RUST Redux Matthew Maurer
2023-11-18  2:54 ` [PATCH v2 1/5] export_report: Rehabilitate script Matthew Maurer
2023-11-18 11:35   ` Greg KH
2023-11-18  2:54 ` [PATCH v2 2/5] modules: Refactor + kdoc elf_validity_cached_copy Matthew Maurer
2023-11-18 11:36   ` Greg KH
2023-11-18  2:54 ` [PATCH v2 3/5] modpost: Extended modversion support Matthew Maurer
2023-11-18 13:42   ` kernel test robot
2023-11-18  2:54 ` [PATCH v2 4/5] rust: Allow MODVERSIONS Matthew Maurer
2023-11-18  2:54 ` [PATCH v2 5/5] export_report: Use new version info format Matthew Maurer
2023-11-22 15:49 ` [PATCH v2 0/5] MODVERSIONS + RUST Redux Masahiro Yamada
2023-11-22 21:04   ` Matthew Maurer
2023-11-23  9:05     ` Greg KH
2023-11-23 11:38       ` Masahiro Yamada
2023-11-23 12:12         ` Greg KH
2023-11-27 19:27           ` Matthew Maurer
2023-11-28  8:05             ` Greg KH
2023-11-28  8:44               ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).