Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Masahiro Yamada <masahiroy@kernel.org>,
	Sasha Levin <sashal@kernel.org>,
	akpm@linux-foundation.org, jpoimboe@kernel.org, kees@kernel.org,
	rppt@kernel.org, graf@amazon.com,
	thomas.weissschuh@linutronix.de, sam@gentoo.org,
	mgorny@gentoo.org, dhowells@redhat.com, elsk@google.com
Subject: [PATCH AUTOSEL 6.16-6.15] kheaders: rebuild kheaders_data.tar.xz when a file is modified within a minute
Date: Sat,  9 Aug 2025 20:20:51 -0400	[thread overview]
Message-ID: <20250810002104.1545396-2-sashal@kernel.org> (raw)
In-Reply-To: <20250810002104.1545396-1-sashal@kernel.org>

From: Masahiro Yamada <masahiroy@kernel.org>

[ Upstream commit 626c54af35764b0b8a4ed5c446458ba6ddfe9cc8 ]

When a header file is changed, kernel/gen_kheaders.sh may fail to update
kernel/kheaders_data.tar.xz.

[steps to reproduce]

[1] Build kernel/kheaders_data.tar.xz

  $ make -j$(nproc) kernel/kheaders.o
    DESCEND objtool
    INSTALL libsubcmd_headers
    CALL    scripts/checksyscalls.sh
    CHK     kernel/kheaders_data.tar.xz
    GEN     kernel/kheaders_data.tar.xz
    CC      kernel/kheaders.o

[2] Modify a header without changing the file size

  $ sed -i s/0xdeadbeef/0xfeedbeef/ include/linux/elfnote.h

[3] Rebuild kernel/kheaders_data.tar.xz

  $ make -j$(nproc) kernel/kheaders.o
    DESCEND objtool
    INSTALL libsubcmd_headers
    CALL    scripts/checksyscalls.sh
    CHK     kernel/kheaders_data.tar.xz

kernel/kheaders_data.tar.xz is not updated if steps [1] - [3] are run
within the same minute.

The headers_md5 variable stores the MD5 hash of the 'ls -l' output
for all header files. This hash value is used to determine whether
kheaders_data.tar.xz needs to be rebuilt. However, 'ls -l' prints the
modification times with minute-level granularity. If a file is modified
within the same minute and its size remains the same, the MD5 hash does
not change.

To reliably detect file modifications, this commit rewrites
kernel/gen_kheaders.sh to output header dependencies to
kernel/.kheaders_data.tar.xz.cmd. Then, Make compares the timestamps
and reruns kernel/gen_kheaders.sh when necessary. This is the standard
mechanism used by Make and Kbuild.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit should be backported to stable kernel trees for the
following reasons:

## Bug Fix Nature
The commit fixes a genuine bug where header file modifications fail to
trigger a rebuild of `kheaders_data.tar.xz` when changes occur within
the same minute. This is a **reliability issue** that affects the
correctness of the kernel build system, particularly for
CONFIG_IKHEADERS functionality.

## Specific Problem Being Fixed
The previous implementation had a **race condition** where:
- The MD5 hash was calculated from `ls -l` output which shows
  modification times with minute-level granularity
- If a file was modified within the same minute without changing its
  size, the MD5 hash wouldn't change
- This caused the build system to miss legitimate file changes,
  resulting in stale header archives

## Impact on Users
This bug directly affects:
1. Kernel developers doing incremental builds
2. Tools and scripts that rely on CONFIG_IKHEADERS having accurate
   header information
3. Automated build systems that may make rapid successive changes

## Size and Complexity Analysis
The changes are:
- **Contained to a single subsystem** (kheaders build infrastructure)
- **Relatively small** (3 files, +66/-77 lines net change)
- **Uses standard Kbuild mechanisms** - the fix replaces the custom
  MD5-based change detection with Make's standard dependency tracking
  via `.cmd` files
- **No architectural changes** - just improves the existing build
  dependency tracking

## Risk Assessment
The fix has **minimal regression risk** because:
1. It replaces a brittle custom mechanism with the well-tested standard
   Kbuild dependency tracking
2. The changes are isolated to the kheaders build process
3. It doesn't affect runtime behavior, only build-time behavior
4. The new implementation is actually simpler and more maintainable than
   the old one

## Code Quality Improvements
The commit also brings these benefits:
- Removes 77 lines of complex shell script logic
- Uses standard Make dependency files (`.kheaders_data.tar.xz.cmd`)
- Properly tracks both source files (`kheaders-srclist`) and object
  files (`kheaders-objlist`)
- More reliable change detection using file timestamps rather than MD5
  of `ls -l` output

## Stable Tree Criteria
This commit meets stable tree requirements:
- ✅ Fixes a real bug that affects users (incremental builds failing to
  detect changes)
- ✅ Small and contained changes
- ✅ No new features introduced
- ✅ Uses existing, proven mechanisms (standard Make dependency tracking)
- ✅ Low risk of regression
- ✅ Clear reproduction steps provided in commit message

The commit is particularly important for stable because build system
reliability is crucial for kernel development workflows, and this bug
could lead to subtle issues where developers think they're testing
updated headers when they're actually using stale ones.

 kernel/.gitignore      |  2 +
 kernel/Makefile        | 47 ++++++++++++++++++---
 kernel/gen_kheaders.sh | 94 ++++++++++--------------------------------
 3 files changed, 66 insertions(+), 77 deletions(-)

diff --git a/kernel/.gitignore b/kernel/.gitignore
index c6b299a6b786..a501bfc80694 100644
--- a/kernel/.gitignore
+++ b/kernel/.gitignore
@@ -1,3 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 /config_data
 /kheaders.md5
+/kheaders-objlist
+/kheaders-srclist
diff --git a/kernel/Makefile b/kernel/Makefile
index 32e80dd626af..9a9ff405ea89 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -158,11 +158,48 @@ filechk_cat = cat $<
 $(obj)/config_data: $(KCONFIG_CONFIG) FORCE
 	$(call filechk,cat)
 
+# kheaders_data.tar.xz
 $(obj)/kheaders.o: $(obj)/kheaders_data.tar.xz
 
-quiet_cmd_genikh = CHK     $(obj)/kheaders_data.tar.xz
-      cmd_genikh = $(CONFIG_SHELL) $(srctree)/kernel/gen_kheaders.sh $@
-$(obj)/kheaders_data.tar.xz: FORCE
-	$(call cmd,genikh)
+quiet_cmd_kheaders_data = GEN     $@
+      cmd_kheaders_data = "$<" "$@" "$(obj)/kheaders-srclist" "$(obj)/kheaders-objlist"
+      cmd_kheaders_data_dep = cat $(depfile) >> $(dot-target).cmd; rm -f $(depfile)
 
-clean-files := kheaders_data.tar.xz kheaders.md5
+define rule_kheaders_data
+	$(call cmd_and_savecmd,kheaders_data)
+	$(call cmd,kheaders_data_dep)
+endef
+
+targets += kheaders_data.tar.xz
+$(obj)/kheaders_data.tar.xz: $(src)/gen_kheaders.sh $(obj)/kheaders-srclist $(obj)/kheaders-objlist $(obj)/kheaders.md5 FORCE
+	$(call if_changed_rule,kheaders_data)
+
+# generated headers in objtree
+#
+# include/generated/utsversion.h is ignored because it is generated
+# after gen_kheaders.sh is executed. (utsversion.h is unneeded for kheaders)
+filechk_kheaders_objlist = \
+	for d in include "arch/$(SRCARCH)/include"; do \
+		find "$${d}/generated" ! -path "include/generated/utsversion.h" -a -name "*.h" -print; \
+	done
+
+$(obj)/kheaders-objlist: FORCE
+	$(call filechk,kheaders_objlist)
+
+# non-generated headers in srctree
+filechk_kheaders_srclist = \
+	for d in include "arch/$(SRCARCH)/include"; do \
+		find "$(srctree)/$${d}" -path "$(srctree)/$${d}/generated" -prune -o -name "*.h" -print; \
+	done
+
+$(obj)/kheaders-srclist: FORCE
+	$(call filechk,kheaders_srclist)
+
+# Some files are symlinks. If symlinks are changed, kheaders_data.tar.xz should
+# be rebuilt.
+filechk_kheaders_md5sum = xargs -r -a $< stat -c %N | md5sum
+
+$(obj)/kheaders.md5: $(obj)/kheaders-srclist FORCE
+	$(call filechk,kheaders_md5sum)
+
+clean-files := kheaders.md5 kheaders-srclist kheaders-objlist
diff --git a/kernel/gen_kheaders.sh b/kernel/gen_kheaders.sh
index c9e5dc068e85..0ff7beabb21a 100755
--- a/kernel/gen_kheaders.sh
+++ b/kernel/gen_kheaders.sh
@@ -4,79 +4,33 @@
 # This script generates an archive consisting of kernel headers
 # for CONFIG_IKHEADERS.
 set -e
-sfile="$(readlink -f "$0")"
-outdir="$(pwd)"
 tarfile=$1
-tmpdir=$outdir/${tarfile%/*}/.tmp_dir
-
-dir_list="
-include/
-arch/$SRCARCH/include/
-"
-
-# Support incremental builds by skipping archive generation
-# if timestamps of files being archived are not changed.
-
-# This block is useful for debugging the incremental builds.
-# Uncomment it for debugging.
-# if [ ! -f /tmp/iter ]; then iter=1; echo 1 > /tmp/iter;
-# else iter=$(($(cat /tmp/iter) + 1)); echo $iter > /tmp/iter; fi
-# find $all_dirs -name "*.h" | xargs ls -l > /tmp/ls-$iter
-
-all_dirs=
-if [ "$building_out_of_srctree" ]; then
-	for d in $dir_list; do
-		all_dirs="$all_dirs $srctree/$d"
-	done
-fi
-all_dirs="$all_dirs $dir_list"
-
-# include/generated/utsversion.h is ignored because it is generated after this
-# script is executed. (utsversion.h is unneeded for kheaders)
-#
-# When Kconfig regenerates include/generated/autoconf.h, its timestamp is
-# updated, but the contents might be still the same. When any CONFIG option is
-# changed, Kconfig touches the corresponding timestamp file include/config/*.
-# Hence, the md5sum detects the configuration change anyway. We do not need to
-# check include/generated/autoconf.h explicitly.
-#
-# Ignore them for md5 calculation to avoid pointless regeneration.
-headers_md5="$(find $all_dirs -name "*.h" -a			\
-		! -path include/generated/utsversion.h -a	\
-		! -path include/generated/autoconf.h		|
-		xargs ls -l | md5sum | cut -d ' ' -f1)"
-
-# Any changes to this script will also cause a rebuild of the archive.
-this_file_md5="$(ls -l $sfile | md5sum | cut -d ' ' -f1)"
-if [ -f $tarfile ]; then tarfile_md5="$(md5sum $tarfile | cut -d ' ' -f1)"; fi
-if [ -f kernel/kheaders.md5 ] &&
-	[ "$(head -n 1 kernel/kheaders.md5)" = "$headers_md5" ] &&
-	[ "$(head -n 2 kernel/kheaders.md5 | tail -n 1)" = "$this_file_md5" ] &&
-	[ "$(tail -n 1 kernel/kheaders.md5)" = "$tarfile_md5" ]; then
-		exit
-fi
-
-echo "  GEN     $tarfile"
+srclist=$2
+objlist=$3
+
+dir=$(dirname "${tarfile}")
+tmpdir=${dir}/.tmp_dir
+depfile=${dir}/.$(basename "${tarfile}").d
+
+# generate dependency list.
+{
+	echo
+	echo "deps_${tarfile} := \\"
+	sed 's:\(.*\):  \1 \\:' "${srclist}"
+	sed -n '/^include\/generated\/autoconf\.h$/!s:\(.*\):  \1 \\:p' "${objlist}"
+	echo
+	echo "${tarfile}: \$(deps_${tarfile})"
+	echo
+	echo "\$(deps_${tarfile}):"
+
+} > "${depfile}"
 
 rm -rf "${tmpdir}"
 mkdir "${tmpdir}"
 
-if [ "$building_out_of_srctree" ]; then
-	(
-		cd $srctree
-		for f in $dir_list
-			do find "$f" -name "*.h";
-		done | tar -c -f - -T - | tar -xf - -C "${tmpdir}"
-	)
-fi
-
-for f in $dir_list;
-	do find "$f" -name "*.h";
-done | tar -c -f - -T - | tar -xf - -C "${tmpdir}"
-
-# Always exclude include/generated/utsversion.h
-# Otherwise, the contents of the tarball may vary depending on the build steps.
-rm -f "${tmpdir}/include/generated/utsversion.h"
+# shellcheck disable=SC2154 # srctree is passed as an env variable
+sed "s:^${srctree}/::" "${srclist}" | tar -c -f - -C "${srctree}" -T - | tar -xf - -C "${tmpdir}"
+tar -c -f - -T "${objlist}" | tar -xf - -C "${tmpdir}"
 
 # Remove comments except SDPX lines
 # Use a temporary file to store directory contents to prevent find/xargs from
@@ -92,8 +46,4 @@ tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \
     --owner=0 --group=0 --sort=name --numeric-owner --mode=u=rw,go=r,a+X \
     -I $XZ -cf $tarfile -C "${tmpdir}/" . > /dev/null
 
-echo $headers_md5 > kernel/kheaders.md5
-echo "$this_file_md5" >> kernel/kheaders.md5
-echo "$(md5sum $tarfile | cut -d ' ' -f1)" >> kernel/kheaders.md5
-
 rm -rf "${tmpdir}"
-- 
2.39.5


  reply	other threads:[~2025-08-10  0:21 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-10  0:20 [PATCH AUTOSEL 6.16-5.4] kconfig: gconf: avoid hardcoding model2 in on_treeview2_cursor_changed() Sasha Levin
2025-08-10  0:20 ` Sasha Levin [this message]
2025-08-10  0:20 ` [PATCH AUTOSEL 6.16-5.4] kconfig: lxdialog: fix 'space' to (de)select options Sasha Levin
2025-08-10  0:20 ` [PATCH AUTOSEL 6.16-5.4] scsi: aacraid: Stop using PCI_IRQ_AFFINITY Sasha Levin
2025-08-10  0:20 ` [PATCH AUTOSEL 6.16-5.4] kconfig: gconf: fix potential memory leak in renderer_edited() Sasha Levin
2025-08-10  0:20 ` [PATCH AUTOSEL 6.16-5.15] scsi: target: core: Generate correct identifiers for PR OUT transport IDs Sasha Levin
2025-08-10  0:20 ` [PATCH AUTOSEL 6.16-5.4] ipmi: Fix strcpy source and destination the same Sasha Levin
2025-08-10  0:20 ` [PATCH AUTOSEL 6.16-5.4] scsi: Fix sas_user_scan() to handle wildcard and multi-channel scans Sasha Levin
2025-08-10  0:20 ` [PATCH AUTOSEL 6.16-6.1] vfio/mlx5: fix possible overflow in tracking max message size Sasha Levin
2025-08-10  0:20 ` [PATCH AUTOSEL 6.16-5.4] kconfig: nconf: Ensure null termination where strncpy is used Sasha Levin
2025-08-10  0:21 ` [PATCH AUTOSEL 6.16-5.4] kconfig: lxdialog: replace strcpy() with strncpy() in inputbox.c Sasha Levin
2025-08-10  0:21 ` [PATCH AUTOSEL 6.16-5.15] vfio/type1: conditional rescheduling while pinning Sasha Levin
2025-08-10  0:21 ` [PATCH AUTOSEL 6.16-5.4] ipmi: Use dev_warn_ratelimited() for incorrect message warnings Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250810002104.1545396-2-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=elsk@google.com \
    --cc=graf@amazon.com \
    --cc=jpoimboe@kernel.org \
    --cc=kees@kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=mgorny@gentoo.org \
    --cc=patches@lists.linux.dev \
    --cc=rppt@kernel.org \
    --cc=sam@gentoo.org \
    --cc=stable@vger.kernel.org \
    --cc=thomas.weissschuh@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox