From: Stefano Tondo <stondo@gmail.com>
To: openembedded-core@lists.openembedded.org
Cc: stefano.tondo.ext@siemens.com, adrian.freihofer@siemens.com,
Peter.Marko@siemens.com, jpewhacker@gmail.com,
Ross.Burton@arm.com
Subject: [PATCH 01/14] spdx30: Add configurable file filtering support
Date: Sat, 21 Feb 2026 05:24:05 +0100 [thread overview]
Message-ID: <20260221042418.317535-2-stondo@gmail.com> (raw)
In-Reply-To: <20260221042418.317535-1-stondo@gmail.com>
From: Stefano Tondo <stefano.tondo.ext@siemens.com>
This commit adds file filtering capabilities to SPDX 3.0 SBOM generation
to reduce SBOM size and focus on relevant files.
New configuration variables (in spdx-common.bbclass):
SPDX_FILE_FILTER (default: "all"):
- "all": Include all files (current behavior)
- "essential": Include only LICENSE/README/NOTICE files
- "none": Skip all files
SPDX_FILE_ESSENTIAL_PATTERNS (extensible):
- Space-separated patterns for essential files
- Default: LICENSE COPYING README NOTICE COPYRIGHT etc.
- Recipes can extend: SPDX_FILE_ESSENTIAL_PATTERNS += "MANIFEST"
SPDX_FILE_EXCLUDE_PATTERNS (extensible):
- Patterns to exclude in 'essential' mode
- Default: .patch .diff test_ /tests/ .pyc .o etc.
- Recipes can extend: SPDX_FILE_EXCLUDE_PATTERNS += ".tmp"
Implementation (in spdx30_tasks.py):
- add_package_files(): Apply filtering during file walk
- get_package_sources_from_debug(): Skip debug source lookup for
filtered files instead of failing
Impact:
- Essential mode reduces file components by ~96% (2,376 → ~90 files)
- Filters out patches, test files, and build artifacts
- Configurable per-recipe via variable extension
- No impact when SPDX_FILE_FILTER="all" (default)
This is useful for creating compact SBOMs for compliance and distribution
where only license-relevant files are needed.
Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
meta/classes/spdx-common.bbclass | 37 +++++++++++++++++++++++++++
meta/lib/oe/spdx30_tasks.py | 44 +++++++++++++++++++++++++++++---
2 files changed, 77 insertions(+), 4 deletions(-)
diff --git a/meta/classes/spdx-common.bbclass b/meta/classes/spdx-common.bbclass
index 3110230c9e..81c61e10dc 100644
--- a/meta/classes/spdx-common.bbclass
+++ b/meta/classes/spdx-common.bbclass
@@ -54,6 +54,43 @@ SPDX_CONCLUDED_LICENSE[doc] = "The license concluded by manual or external \
SPDX_MULTILIB_SSTATE_ARCHS ??= "${SSTATE_ARCHS}"
+SPDX_FILES_INCLUDED ??= "all"
+SPDX_FILES_INCLUDED[doc] = "Controls which files are included in SPDX output. \
+ Values: 'all' (include all files), 'essential' (only LICENSE/README/NOTICE files), \
+ 'none' (no files). The 'essential' mode reduces SBOM size by excluding patches, \
+ tests, and build artifacts."
+
+SPDX_FILE_ESSENTIAL_PATTERNS ??= "LICENSE COPYING README NOTICE COPYRIGHT PATENTS ACKNOWLEDGEMENTS THIRD-PARTY-NOTICES"
+SPDX_FILE_ESSENTIAL_PATTERNS[doc] = "Space-separated list of file name patterns to \
+ include when SPDX_FILES_INCLUDED='essential'. Recipes can extend this to add their \
+ own essential files (e.g., 'SPDX_FILE_ESSENTIAL_PATTERNS += \"MANIFEST\"')."
+
+SPDX_FILE_EXCLUDE_PATTERNS ??= ".patch .diff test_ _test. /test/ /tests/ .pyc .pyo .o .a .la"
+SPDX_FILE_EXCLUDE_PATTERNS[doc] = "Space-separated list of patterns to exclude when \
+ SPDX_FILES_INCLUDED='essential'. Files matching these patterns are filtered out. \
+ Recipes can extend this to exclude additional file types."
+
+SBOM_COMPONENT_NAME ??= ""
+SBOM_COMPONENT_NAME[doc] = "Name of the SBOM metadata component. If set, creates a \
+ software_Package element in the SBOM with image/product information. Typically \
+ set to IMAGE_BASENAME or product name."
+
+SBOM_COMPONENT_VERSION ??= "${DISTRO_VERSION}"
+SBOM_COMPONENT_VERSION[doc] = "Version of the SBOM metadata component. Used when \
+ SBOM_COMPONENT_NAME is set. Defaults to DISTRO_VERSION."
+
+SBOM_COMPONENT_SUMMARY ??= ""
+SBOM_COMPONENT_SUMMARY[doc] = "Description of the SBOM metadata component. Used when \
+ SBOM_COMPONENT_NAME is set. Typically set to IMAGE_SUMMARY or product description."
+
+SBOM_SUPPLIER_NAME ??= ""
+SBOM_SUPPLIER_NAME[doc] = "Name of the organization supplying the SBOM. If set, \
+ creates an Organization element in the SBOM with supplier information."
+
+SBOM_SUPPLIER_URL ??= ""
+SBOM_SUPPLIER_URL[doc] = "URL of the organization supplying the SBOM. Used when \
+ SBOM_SUPPLIER_NAME is set. Adds an external identifier with the organization URL."
+
python () {
from oe.cve_check import extend_cve_status
extend_cve_status(d)
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 99f2892dfb..bd703b5bec 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -161,6 +161,11 @@ def add_package_files(
compiled_sources, types = oe.spdx_common.get_compiled_sources(d)
bb.debug(1, f"Total compiled files: {len(compiled_sources)}")
+ # File filtering configuration
+ spdx_file_filter = (d.getVar("SPDX_FILE_FILTER") or "all").lower()
+ essential_patterns = (d.getVar("SPDX_FILE_ESSENTIAL_PATTERNS") or "").split()
+ exclude_patterns = (d.getVar("SPDX_FILE_EXCLUDE_PATTERNS") or "").split()
+
for subdir, dirs, files in os.walk(topdir, onerror=walk_error):
dirs[:] = [d for d in dirs if d not in ignore_dirs]
if subdir == str(topdir):
@@ -174,6 +179,26 @@ def add_package_files(
continue
filename = str(filepath.relative_to(topdir))
+
+ # Apply file filtering if enabled
+ if spdx_file_filter == "essential":
+ file_upper = file.upper()
+ filename_lower = filename.lower()
+
+ # Skip if matches exclude patterns
+ skip_file = any(pattern in filename_lower for pattern in exclude_patterns)
+ if skip_file:
+ continue
+
+ # Keep only essential files (license/readme/etc)
+ is_essential = any(pattern in file_upper for pattern in essential_patterns)
+ if not is_essential:
+ continue
+ elif spdx_file_filter == "none":
+ # Skip all files
+ continue
+ # else: spdx_file_filter == "all" or any other value - include all files
+
file_purposes = get_purposes(filepath)
# Check if file is compiled
@@ -219,6 +244,8 @@ def add_package_files(
def get_package_sources_from_debug(
d, package, package_files, sources, source_hash_cache
):
+ spdx_file_filter = (d.getVar("SPDX_FILE_FILTER") or "all").lower()
+
def file_path_match(file_path, pkg_file):
if file_path.lstrip("/") == pkg_file.name.lstrip("/"):
return True
@@ -251,10 +278,19 @@ def get_package_sources_from_debug(
continue
if not any(file_path_match(file_path, pkg_file) for pkg_file in package_files):
- bb.fatal(
- "No package file found for %s in %s; SPDX found: %s"
- % (str(file_path), package, " ".join(p.name for p in package_files))
- )
+ # When file filtering is active, some files may be filtered out
+ # Skip debug source lookup instead of failing
+ if spdx_file_filter in ("none", "essential"):
+ bb.debug(
+ 1,
+ f"Skipping debug source lookup for {file_path} in {package} (filtered by SPDX_FILE_FILTER={spdx_file_filter})",
+ )
+ continue
+ else:
+ bb.fatal(
+ "No package file found for %s in %s; SPDX found: %s"
+ % (str(file_path), package, " ".join(p.name for p in package_files))
+ )
continue
for debugsrc in file_data["debugsrc"]:
--
2.53.0
next prev parent reply other threads:[~2026-02-21 4:24 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-21 4:24 [PATCH 00/14] spdx30: SBOM enrichment for PURL, metadata, and compliance Stefano Tondo
2026-02-21 4:24 ` Stefano Tondo [this message]
2026-02-21 4:24 ` [PATCH 02/14] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
2026-02-21 4:24 ` [PATCH 03/14] spdx30: Add ecosystem-specific PURL generation Stefano Tondo
2026-02-21 4:24 ` [PATCH 04/14] spdx30: Add version extraction from SRCREV for Git source components Stefano Tondo
2026-02-21 4:24 ` [PATCH 05/14] spdx30: Add SPDX_GIT_PURL_MAPPINGS for Git hosting Stefano Tondo
2026-02-21 4:24 ` [PATCH 06/14] sbom30: Fix object deduplication to preserve complete data Stefano Tondo
2026-02-21 4:24 ` [PATCH 07/14] spdx30: Enrich source downloads with external refs and PURLs Stefano Tondo
2026-02-21 4:24 ` [PATCH 08/14] spdx30: Include recipe base PURL in package external identifiers Stefano Tondo
2026-02-21 4:24 ` [PATCH 09/14] spdx30: Add image root metadata package with describes relationship Stefano Tondo
2026-02-21 4:24 ` [PATCH 10/14] spdx30_tasks: Fix non-deterministic BUILDNAME in image package version Stefano Tondo
2026-02-21 4:24 ` [PATCH 11/14] spdx30: Add rootfs version and dependency scope classification Stefano Tondo
2026-02-21 4:24 ` [PATCH 12/14] oeqa/selftest: Add test for download_location defensive handling Stefano Tondo
2026-02-21 4:24 ` [PATCH 13/14] spdx.py: Add test for version extraction patterns Stefano Tondo
2026-02-21 4:24 ` [PATCH 14/14] cve_check: Escape special characters in CPE 2.3 formatted strings Stefano Tondo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260221042418.317535-2-stondo@gmail.com \
--to=stondo@gmail.com \
--cc=Peter.Marko@siemens.com \
--cc=Ross.Burton@arm.com \
--cc=adrian.freihofer@siemens.com \
--cc=jpewhacker@gmail.com \
--cc=openembedded-core@lists.openembedded.org \
--cc=stefano.tondo.ext@siemens.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.