public inbox for openembedded-core@lists.openembedded.org
 help / color / mirror / Atom feed
* [OE-core][PATCH v13 0/4] SPDX 3.0 SBOM enrichment and compliance improvements
@ 2026-03-23 21:07 Stefano Tondo
  2026-03-23 21:07 ` [PATCH v13 1/4] spdx30: Add configurable file exclusion pattern support Stefano Tondo
                   ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Stefano Tondo @ 2026-03-23 21:07 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

This series enhances SPDX 3.0 SBOM generation with enriched
metadata and compliance-oriented controls for current master.

Changes since v12:

  - Respun the full series from scratch on current master to eliminate
    cross-patch churn introduced during a previous rebase: patches were
    modifying code that later patches in the same series changed again.
    The net diff is byte-identical to v12; only patch boundaries changed
    so each commit is now self-contained with no overlapping hunks.

Validated with:

  oe-selftest -r \
    spdx.SPDX30Check.test_packageconfig_spdx \
    spdx.SPDX30Check.test_download_location_defensive_handling \
    spdx.SPDX30Check.test_version_extraction_patterns

Stefano Tondo (4):
  spdx30: Add configurable file exclusion pattern support
  spdx30: Add supplier support for image and SDK SBOMs
  spdx30: Enrich source downloads with version and PURL
  oeqa/selftest: Add tests for source download enrichment

 meta/classes-recipe/cargo_common.bbclass |   3 +
 meta/classes-recipe/cpan.bbclass         |  11 +
 meta/classes-recipe/go-mod.bbclass       |   6 +
 meta/classes-recipe/npm.bbclass          |   7 +
 meta/classes-recipe/pypi.bbclass         |   6 +-
 meta/classes/create-spdx-3.0.bbclass     |  17 ++
 meta/classes/spdx-common.bbclass         |   7 +
 meta/lib/oe/spdx30_tasks.py              | 278 +++++++++++++++++------
 meta/lib/oeqa/selftest/cases/spdx.py     | 104 +++++++--
 9 files changed, 345 insertions(+), 94 deletions(-)

-- 
2.53.0



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v13 1/4] spdx30: Add configurable file exclusion pattern support
  2026-03-23 21:07 [OE-core][PATCH v13 0/4] SPDX 3.0 SBOM enrichment and compliance improvements Stefano Tondo
@ 2026-03-23 21:07 ` Stefano Tondo
  2026-03-23 21:07 ` [PATCH v13 2/4] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Stefano Tondo @ 2026-03-23 21:07 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Add SPDX_FILE_EXCLUDE_PATTERNS variable that allows filtering files from
SPDX output by regex matching. The variable accepts a space-separated
list of Python regular expressions; files whose paths match any pattern
(via re.search) are excluded.

When empty (the default), no filtering is applied and all files are
included, preserving existing behavior.

This enables users to reduce SBOM size by excluding files that are not
relevant for compliance (e.g., test files, object files, patches).

Excluded files are tracked in a set returned from add_package_files()
and passed to get_package_sources_from_debug(), which uses the set for
precise cross-checking rather than re-evaluating patterns.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes/spdx-common.bbclass |  7 +++
 meta/lib/oe/spdx30_tasks.py      | 80 +++++++++++++++++++++-----------
 2 files changed, 60 insertions(+), 27 deletions(-)

diff --git a/meta/classes/spdx-common.bbclass b/meta/classes/spdx-common.bbclass
index 83f05579b6..40701730a6 100644
--- a/meta/classes/spdx-common.bbclass
+++ b/meta/classes/spdx-common.bbclass
@@ -82,6 +82,13 @@ SPDX_MULTILIB_SSTATE_ARCHS[doc] = "The list of sstate architectures to consider
     when collecting SPDX dependencies. This includes multilib architectures when \
     multilib is enabled. Defaults to SSTATE_ARCHS."
 
+SPDX_FILE_EXCLUDE_PATTERNS ??= ""
+SPDX_FILE_EXCLUDE_PATTERNS[doc] = "Space-separated list of Python regular \
+    expressions to exclude files from SPDX output. Files whose paths match \
+    any pattern (via re.search) will be filtered out. Defaults to empty \
+    (no filtering). Example: \
+    SPDX_FILE_EXCLUDE_PATTERNS = '\\.patch$ \\.diff$ /test/ \\.pyc$ \\.o$'"
+
 python () {
     from oe.cve_check import extend_cve_status
     extend_cve_status(d)
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 353d783fa2..68ed821a8c 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -13,6 +13,7 @@ import oe.spdx30
 import oe.spdx_common
 import oe.sdk
 import os
+import re
 
 from contextlib import contextmanager
 from datetime import datetime, timezone
@@ -157,17 +158,27 @@ def add_package_files(
     file_counter = 1
     if not os.path.exists(topdir):
         bb.note(f"Skip {topdir}")
-        return spdx_files
+        return spdx_files, set()
 
     check_compiled_sources = d.getVar("SPDX_INCLUDE_COMPILED_SOURCES") == "1"
     if check_compiled_sources:
         compiled_sources, types = oe.spdx_common.get_compiled_sources(d)
         bb.debug(1, f"Total compiled files: {len(compiled_sources)}")
 
+    exclude_patterns = [
+        re.compile(pattern)
+        for pattern in (d.getVar("SPDX_FILE_EXCLUDE_PATTERNS") or "").split()
+    ]
+    excluded_files = set()
+
     for subdir, dirs, files in os.walk(topdir, onerror=walk_error):
-        dirs[:] = [d for d in dirs if d not in ignore_dirs]
+        dirs[:] = [directory for directory in dirs if directory not in ignore_dirs]
         if subdir == str(topdir):
-            dirs[:] = [d for d in dirs if d not in ignore_top_level_dirs]
+            dirs[:] = [
+                directory
+                for directory in dirs
+                if directory not in ignore_top_level_dirs
+            ]
 
         dirs.sort()
         files.sort()
@@ -177,14 +188,19 @@ def add_package_files(
                 continue
 
             filename = str(filepath.relative_to(topdir))
+
+            if exclude_patterns and any(
+                pattern.search(filename) for pattern in exclude_patterns
+            ):
+                excluded_files.add(filename)
+                continue
+
             file_purposes = get_purposes(filepath)
 
-            # Check if file is compiled
-            if check_compiled_sources:
-                if not oe.spdx_common.is_compiled_source(
-                    filename, compiled_sources, types
-                ):
-                    continue
+            if check_compiled_sources and not oe.spdx_common.is_compiled_source(
+                filename, compiled_sources, types
+            ):
+                continue
 
             spdx_file = objset.new_file(
                 get_spdxid(file_counter),
@@ -218,12 +234,15 @@ def add_package_files(
 
     bb.debug(1, "Added %d files to %s" % (len(spdx_files), objset.doc._id))
 
-    return spdx_files
+    return spdx_files, excluded_files
 
 
 def get_package_sources_from_debug(
-    d, package, package_files, sources, source_hash_cache
+    d, package, package_files, sources, source_hash_cache, excluded_files=None
 ):
+    if excluded_files is None:
+        excluded_files = set()
+
     def file_path_match(file_path, pkg_file):
         if file_path.lstrip("/") == pkg_file.name.lstrip("/"):
             return True
@@ -256,6 +275,12 @@ def get_package_sources_from_debug(
             continue
 
         if not any(file_path_match(file_path, pkg_file) for pkg_file in package_files):
+            if file_path.lstrip("/") in excluded_files:
+                bb.debug(
+                    1,
+                    f"Skipping debug source lookup for excluded file {file_path} in {package}",
+                )
+                continue
             bb.fatal(
                 "No package file found for %s in %s; SPDX found: %s"
                 % (str(file_path), package, " ".join(p.name for p in package_files))
@@ -737,7 +762,7 @@ def create_spdx(d):
         bb.debug(1, "Adding source files to SPDX")
         oe.spdx_common.get_patched_src(d)
 
-        files = add_package_files(
+        files, _ = add_package_files(
             d,
             build_objset,
             spdx_workdir,
@@ -909,7 +934,7 @@ def create_spdx(d):
                 )
 
             bb.debug(1, "Adding package files to SPDX for package %s" % pkg_name)
-            package_files = add_package_files(
+            package_files, excluded_files = add_package_files(
                 d,
                 pkg_objset,
                 pkgdest / package,
@@ -932,7 +957,8 @@ def create_spdx(d):
 
             if include_sources:
                 debug_sources = get_package_sources_from_debug(
-                    d, package, package_files, dep_sources, source_hash_cache
+                    d, package, package_files, dep_sources, source_hash_cache,
+                    excluded_files=excluded_files,
                 )
                 debug_source_ids |= set(
                     oe.sbom30.get_element_link_id(d) for d in debug_sources
@@ -944,7 +970,7 @@ def create_spdx(d):
 
     if include_sources:
         bb.debug(1, "Adding sysroot files to SPDX")
-        sysroot_files = add_package_files(
+        sysroot_files, _ = add_package_files(
             d,
             build_objset,
             d.expand("${COMPONENTS_DIR}/${PACKAGE_ARCH}/${PN}"),
@@ -1326,18 +1352,18 @@ def create_image_spdx(d):
             image_filename = image["filename"]
             image_path = image_deploy_dir / image_filename
             if os.path.isdir(image_path):
-                a = add_package_files(
-                    d,
-                    objset,
-                    image_path,
-                    lambda file_counter: objset.new_spdxid(
-                        "imagefile", str(file_counter)
-                    ),
-                    lambda filepath: [],
-                    license_data=None,
-                    ignore_dirs=[],
-                    ignore_top_level_dirs=[],
-                    archive=None,
+                a, _ = add_package_files(
+                        d,
+                        objset,
+                        image_path,
+                        lambda file_counter: objset.new_spdxid(
+                            "imagefile", str(file_counter)
+                        ),
+                        lambda filepath: [],
+                        license_data=None,
+                        ignore_dirs=[],
+                        ignore_top_level_dirs=[],
+                        archive=None,
                 )
                 artifacts.extend(a)
             else:
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v13 2/4] spdx30: Add supplier support for image and SDK SBOMs
  2026-03-23 21:07 [OE-core][PATCH v13 0/4] SPDX 3.0 SBOM enrichment and compliance improvements Stefano Tondo
  2026-03-23 21:07 ` [PATCH v13 1/4] spdx30: Add configurable file exclusion pattern support Stefano Tondo
@ 2026-03-23 21:07 ` Stefano Tondo
  2026-03-23 21:07 ` [PATCH v13 3/4] spdx30: Enrich source downloads with version and PURL Stefano Tondo
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Stefano Tondo @ 2026-03-23 21:07 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand, Joshua Watt

Add SPDX_IMAGE_SUPPLIER and SPDX_SDK_SUPPLIER variables that allow
setting a supplier agent on image and SDK SBOM root elements using
the suppliedBy property.

These follow the existing SPDX_PACKAGE_SUPPLIER pattern and use the
standard agent variable system to define supplier information.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
Reviewed-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/create-spdx-3.0.bbclass | 10 ++++++++++
 meta/lib/oe/spdx30_tasks.py          | 23 ++++++++++++++++++++---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
index 7515f460c3..9a6606dce6 100644
--- a/meta/classes/create-spdx-3.0.bbclass
+++ b/meta/classes/create-spdx-3.0.bbclass
@@ -124,6 +124,16 @@ SPDX_ON_BEHALF_OF[doc] = "The base variable name to describe the Agent on who's
 SPDX_PACKAGE_SUPPLIER[doc] = "The base variable name to describe the Agent who \
     is supplying artifacts produced by the build"
 
+SPDX_IMAGE_SUPPLIER[doc] = "The base variable name to describe the Agent who \
+    is supplying the image SBOM. The supplier will be set on all root elements \
+    of the image SBOM using the suppliedBy property. If not set, no supplier \
+    information will be added to the image SBOM."
+
+SPDX_SDK_SUPPLIER[doc] = "The base variable name to describe the Agent who \
+    is supplying the SDK SBOM. The supplier will be set on all root elements \
+    of the SDK SBOM using the suppliedBy property. If not set, no supplier \
+    information will be added to the SDK SBOM."
+
 SPDX_PACKAGE_VERSION ??= "${PV}"
 SPDX_PACKAGE_VERSION[doc] = "The version of a package, software_packageVersion \
     in software_Package"
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 68ed821a8c..62a00069df 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -1449,6 +1449,16 @@ def create_image_sbom_spdx(d):
 
     objset, sbom = oe.sbom30.create_sbom(d, image_name, root_elements)
 
+    # Set supplier on root elements if SPDX_IMAGE_SUPPLIER is defined
+    supplier = objset.new_agent("SPDX_IMAGE_SUPPLIER", add=False)
+    if supplier is not None:
+        supplier_id = supplier if isinstance(supplier, str) else supplier._id
+        if not isinstance(supplier, str):
+            objset.add(supplier)
+        for elem in sbom.rootElement:
+            if hasattr(elem, "suppliedBy"):
+                elem.suppliedBy = supplier_id
+
     oe.sbom30.write_jsonld_doc(d, objset, spdx_path)
 
     def make_image_link(target_path, suffix):
@@ -1560,12 +1570,19 @@ def create_sdk_sbom(d, sdk_deploydir, spdx_work_dir, toolchain_outputname):
         d, toolchain_outputname, sorted(list(files)), [rootfs_objset]
     )
 
+    # Set supplier on root elements if SPDX_SDK_SUPPLIER is defined
+    supplier = objset.new_agent("SPDX_SDK_SUPPLIER", add=False)
+    if supplier is not None:
+        supplier_id = supplier if isinstance(supplier, str) else supplier._id
+        if not isinstance(supplier, str):
+            objset.add(supplier)
+        for elem in sbom.rootElement:
+            if hasattr(elem, "suppliedBy"):
+                elem.suppliedBy = supplier_id
+
     oe.sbom30.write_jsonld_doc(
         d, objset, sdk_deploydir / (toolchain_outputname + ".spdx.json")
     )
-
-
-def create_recipe_sbom(d, deploydir):
     sbom_name = d.getVar("SPDX_RECIPE_SBOM_NAME")
 
     recipe, recipe_objset = load_recipe_spdx(d)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v13 3/4] spdx30: Enrich source downloads with version and PURL
  2026-03-23 21:07 [OE-core][PATCH v13 0/4] SPDX 3.0 SBOM enrichment and compliance improvements Stefano Tondo
  2026-03-23 21:07 ` [PATCH v13 1/4] spdx30: Add configurable file exclusion pattern support Stefano Tondo
  2026-03-23 21:07 ` [PATCH v13 2/4] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
@ 2026-03-23 21:07 ` Stefano Tondo
  2026-03-23 21:07 ` [PATCH v13 4/4] oeqa/selftest: Add tests for source download enrichment Stefano Tondo
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
  4 siblings, 0 replies; 32+ messages in thread
From: Stefano Tondo @ 2026-03-23 21:07 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Add version extraction, PURL generation, and external references
to source download packages in SPDX 3.0 SBOMs:

- Extract version from SRCREV for Git sources (full SHA-1)
- Generate PURLs for Git sources on github.com by default
- Support custom mappings via SPDX_GIT_PURL_MAPPINGS variable
  (format: "domain:purl_type", split(':', 1) for parsing)
- Use ecosystem PURLs from SPDX_PACKAGE_URLS for non-Git
- Add VCS external references for Git downloads
- Add distribution external references for tarball downloads
- Parse Git URLs using urllib.parse
- Extract logic into _generate_git_purl() and
  _enrich_source_package() helpers

For non-Git sources, version is not set from PV since the recipe
version does not necessarily reflect the version of individual
downloaded files. Ecosystem PURLs (which include version) from
SPDX_PACKAGE_URLS are still used when available.

The SPDX_GIT_PURL_MAPPINGS variable allows configuring PURL
generation for self-hosted Git services (e.g., GitLab).
github.com is always mapped to pkg:github by default.

Add ecosystem-specific SPDX_PACKAGE_URLS to recipe classes:
- cargo_common.bbclass: pkg:cargo
- cpan.bbclass: pkg:cpan (with prefix stripping)
- go-mod.bbclass: pkg:golang
- npm.bbclass: pkg:npm (with prefix stripping)
- pypi.bbclass: pkg:pypi (with normalization)

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes-recipe/cargo_common.bbclass |   3 +
 meta/classes-recipe/cpan.bbclass         |  11 ++
 meta/classes-recipe/go-mod.bbclass       |   6 +
 meta/classes-recipe/npm.bbclass          |   7 +
 meta/classes-recipe/pypi.bbclass         |   6 +-
 meta/classes/create-spdx-3.0.bbclass     |   7 +
 meta/lib/oe/spdx30_tasks.py              | 175 +++++++++++++++++------
 7 files changed, 172 insertions(+), 43 deletions(-)

diff --git a/meta/classes-recipe/cargo_common.bbclass b/meta/classes-recipe/cargo_common.bbclass
index bc44ad7918..0d3edfe4a7 100644
--- a/meta/classes-recipe/cargo_common.bbclass
+++ b/meta/classes-recipe/cargo_common.bbclass
@@ -240,3 +240,6 @@ EXPORT_FUNCTIONS do_configure
 # https://github.com/rust-lang/libc/issues/3223
 # https://github.com/rust-lang/libc/pull/3175
 INSANE_SKIP:append = " 32bit-time"
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:cargo/${BPN}@${PV} "
diff --git a/meta/classes-recipe/cpan.bbclass b/meta/classes-recipe/cpan.bbclass
index bb76a5b326..dbf44da9d2 100644
--- a/meta/classes-recipe/cpan.bbclass
+++ b/meta/classes-recipe/cpan.bbclass
@@ -68,4 +68,15 @@ cpan_do_install () {
 	done
 }
 
+# Generate ecosystem-specific Package URL for SPDX
+def cpan_spdx_name(d):
+    bpn = d.getVar('BPN')
+    if bpn.startswith('perl-'):
+        return bpn[5:]
+    elif bpn.startswith('libperl-'):
+        return bpn[8:]
+    return bpn
+
+SPDX_PACKAGE_URLS =+ "pkg:cpan/${@cpan_spdx_name(d)}@${PV} "
+
 EXPORT_FUNCTIONS do_configure do_compile do_install
diff --git a/meta/classes-recipe/go-mod.bbclass b/meta/classes-recipe/go-mod.bbclass
index a15dda8f0e..5b3cb2d8b9 100644
--- a/meta/classes-recipe/go-mod.bbclass
+++ b/meta/classes-recipe/go-mod.bbclass
@@ -32,3 +32,9 @@ do_compile[dirs] += "${B}/src/${GO_WORKDIR}"
 # Make go install unpack the module zip files in the module cache directory
 # before the license directory is polulated with license files.
 addtask do_compile before do_populate_lic
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:golang/${GO_IMPORT}@${PV} "
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:golang/${GO_IMPORT}@${PV} "
diff --git a/meta/classes-recipe/npm.bbclass b/meta/classes-recipe/npm.bbclass
index 344e8b4bec..7bb791d543 100644
--- a/meta/classes-recipe/npm.bbclass
+++ b/meta/classes-recipe/npm.bbclass
@@ -354,4 +354,11 @@ FILES:${PN} += " \
     ${nonarch_libdir} \
 "
 
+# Generate ecosystem-specific Package URL for SPDX
+def npm_spdx_name(d):
+    bpn = d.getVar('BPN')
+    return bpn[5:] if bpn.startswith('node-') else bpn
+
+SPDX_PACKAGE_URLS =+ "pkg:npm/${@npm_spdx_name(d)}@${PV} "
+
 EXPORT_FUNCTIONS do_configure do_compile do_install
diff --git a/meta/classes-recipe/pypi.bbclass b/meta/classes-recipe/pypi.bbclass
index 9d46c035f6..e2d054af6d 100644
--- a/meta/classes-recipe/pypi.bbclass
+++ b/meta/classes-recipe/pypi.bbclass
@@ -43,7 +43,8 @@ SECTION = "devel/python"
 SRC_URI:prepend = "${PYPI_SRC_URI} "
 S = "${UNPACKDIR}/${PYPI_PACKAGE}-${PV}"
 
-UPSTREAM_CHECK_PYPI_PACKAGE ?= "${PYPI_PACKAGE}"
+# Replace any '_' characters in the pypi URI with '-'s to follow the PyPi website naming conventions
+UPSTREAM_CHECK_PYPI_PACKAGE ?= "${@pypi_normalize(d)}"
 
 # Use the simple repository API rather than the potentially unstable project URL
 # More information on the pypi API specification is avaialble here:
@@ -54,3 +55,6 @@ UPSTREAM_CHECK_URI ?= "https://pypi.org/simple/${@pypi_normalize(d)}/"
 UPSTREAM_CHECK_REGEX ?= "${UPSTREAM_CHECK_PYPI_PACKAGE}-(?P<pver>(\d+[\.\-_]*)+).(tar\.gz|tgz|zip|tar\.bz2)"
 
 CVE_PRODUCT ?= "python:${PYPI_PACKAGE}"
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:pypi/${@pypi_normalize(d)}@${PV} "
diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
index 9a6606dce6..265dc525bc 100644
--- a/meta/classes/create-spdx-3.0.bbclass
+++ b/meta/classes/create-spdx-3.0.bbclass
@@ -156,6 +156,13 @@ SPDX_RECIPE_SBOM_NAME ?= "${PN}-recipe-sbom"
 SPDX_RECIPE_SBOM_NAME[doc] = "The name of output recipe SBoM when using \
     create_recipe_sbom"
 
+SPDX_GIT_PURL_MAPPINGS ??= ""
+SPDX_GIT_PURL_MAPPINGS[doc] = "A space separated list of domain:purl_type \
+    mappings to configure PURL generation for Git source downloads. \
+    For example, "gitlab.example.com:pkg:gitlab" maps repositories hosted \
+    on gitlab.example.com to the pkg:gitlab PURL type. \
+    github.com is always mapped to pkg:github by default."
+
 IMAGE_CLASSES:append = " create-spdx-image-3.0"
 SDK_CLASSES += "create-spdx-sdk-3.0"
 
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 62a00069df..6f0bdba975 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -14,6 +14,7 @@ import oe.spdx_common
 import oe.sdk
 import os
 import re
+import urllib.parse
 
 from contextlib import contextmanager
 from datetime import datetime, timezone
@@ -384,6 +385,120 @@ def collect_dep_sources(dep_objsets, dest):
             index_sources_by_hash(e.to, dest)
 
 
+def _generate_git_purl(d, download_location, srcrev):
+    """Generate a Package URL for a Git source from its download location.
+
+    Parses the Git URL to identify the hosting service and generates the
+    appropriate PURL type. Supports github.com by default and custom
+    mappings via SPDX_GIT_PURL_MAPPINGS.
+
+    Returns the PURL string or None if no mapping matches.
+    """
+    if not download_location or not download_location.startswith('git+'):
+        return None
+
+    git_url = download_location[4:]  # Remove 'git+' prefix
+
+    # Default handler: github.com
+    git_purl_handlers = {
+        'github.com': 'pkg:github',
+    }
+
+    # Custom PURL mappings from SPDX_GIT_PURL_MAPPINGS
+    # Format: "domain1:purl_type1 domain2:purl_type2"
+    custom_mappings = d.getVar('SPDX_GIT_PURL_MAPPINGS')
+    if custom_mappings:
+        for mapping in custom_mappings.split():
+            parts = mapping.split(':', 1)
+            if len(parts) == 2:
+                git_purl_handlers[parts[0]] = parts[1]
+                bb.debug(2, f"Added custom Git PURL mapping: {parts[0]} -> {parts[1]}")
+            else:
+                bb.warn(f"Invalid SPDX_GIT_PURL_MAPPINGS entry: {mapping} (expected format: domain:purl_type)")
+
+    try:
+        parsed = urllib.parse.urlparse(git_url)
+    except Exception:
+        return None
+
+    hostname = parsed.hostname
+    if not hostname:
+        return None
+
+    for domain, purl_type in git_purl_handlers.items():
+        if hostname == domain:
+            path = parsed.path.strip('/')
+            path_parts = path.split('/')
+            if len(path_parts) >= 2:
+                owner = path_parts[0]
+                repo = path_parts[1].replace('.git', '')
+                return f"{purl_type}/{owner}/{repo}@{srcrev}"
+            break
+
+    return None
+
+
+def _enrich_source_package(d, dl, fd, file_name, primary_purpose):
+    """Enrich a source download package with version, PURL, and external refs.
+
+    Extracts version from SRCREV for Git sources, generates PURLs for
+    known hosting services, and adds external references for VCS,
+    distribution URLs, and homepage.
+    """
+    version = None
+    purl = None
+
+    if fd.type == "git":
+        # Use full SHA-1 from fd.revision
+        srcrev = getattr(fd, 'revision', None)
+        if srcrev and srcrev not in {'${AUTOREV}', 'AUTOINC', 'INVALID'}:
+            version = srcrev
+
+        # Generate PURL for Git hosting services
+        download_location = getattr(dl, 'software_downloadLocation', None)
+        if version and download_location:
+            purl = _generate_git_purl(d, download_location, version)
+    else:
+        # Use ecosystem PURL from SPDX_PACKAGE_URLS if available
+        package_urls = (d.getVar('SPDX_PACKAGE_URLS') or '').split()
+        for url in package_urls:
+            if not url.startswith('pkg:yocto'):
+                purl = url
+                break
+
+    if version:
+        dl.software_packageVersion = version
+
+    if purl:
+        dl.software_packageUrl = purl
+
+    # Add external references
+    download_location = getattr(dl, 'software_downloadLocation', None)
+    if download_location and isinstance(download_location, str):
+        dl.externalRef = dl.externalRef or []
+
+        if download_location.startswith('git+'):
+            # VCS reference for Git repositories
+            git_url = download_location[4:]
+            if '@' in git_url:
+                git_url = git_url.split('@')[0]
+
+            dl.externalRef.append(
+                oe.spdx30.ExternalRef(
+                    externalRefType=oe.spdx30.ExternalRefType.vcs,
+                    locator=[git_url],
+                )
+            )
+        elif download_location.startswith(('http://', 'https://', 'ftp://')):
+            # Distribution reference for tarball/archive downloads
+            dl.externalRef.append(
+                oe.spdx30.ExternalRef(
+                    externalRefType=oe.spdx30.ExternalRefType.altDownloadLocation,
+                    locator=[download_location],
+                )
+            )
+
+
 def add_download_files(d, objset):
     inputs = set()
 
@@ -447,10 +562,14 @@ def add_download_files(d, objset):
                 )
             )
 
+            _enrich_source_package(d, dl, fd, file_name, primary_purpose)
+
             if fd.method.supports_checksum(fd):
                 # TODO Need something better than hard coding this
                 for checksum_id in ["sha256", "sha1"]:
-                    expected_checksum = getattr(fd, "%s_expected" % checksum_id, None)
+                    expected_checksum = getattr(
+                        fd, "%s_expected" % checksum_id, None
+                    )
                     if expected_checksum is None:
                         continue
 
@@ -506,7 +625,6 @@ def get_is_native(d):
 
 def create_recipe_spdx(d):
     deploydir = Path(d.getVar("SPDXRECIPEDEPLOY"))
-    deploy_dir_spdx = Path(d.getVar("DEPLOY_DIR_SPDX"))
     pn = d.getVar("PN")
 
     license_data = oe.spdx_common.load_spdx_license_data(d)
@@ -541,20 +659,6 @@ def create_recipe_spdx(d):
 
     set_purls(recipe, (d.getVar("SPDX_PACKAGE_URLS") or "").split())
 
-    # TODO: This doesn't work before do_unpack because the license text has to
-    # be available for recipes with NO_GENERIC_LICENSE
-    # recipe_spdx_license = add_license_expression(
-    #    d,
-    #    recipe_objset,
-    #    d.getVar("LICENSE"),
-    #    license_data,
-    # )
-    # recipe_objset.new_relationship(
-    #    [recipe],
-    #    oe.spdx30.RelationshipType.hasDeclaredLicense,
-    #    [oe.sbom30.get_element_link_id(recipe_spdx_license)],
-    # )
-
     if val := d.getVar("HOMEPAGE"):
         recipe.software_homePage = val
 
@@ -588,7 +692,6 @@ def create_recipe_spdx(d):
             sorted(oe.sbom30.get_element_link_id(dep) for dep in dep_recipes),
         )
 
-    # Add CVEs
     cve_by_status = {}
     if include_vex != "none":
         patched_cves = oe.cve_check.get_patched_cves(d)
@@ -598,8 +701,6 @@ def create_recipe_spdx(d):
             description = patched_cve.get("justification", None)
             resources = patched_cve.get("resource", [])
 
-            # If this CVE is fixed upstream, skip it unless all CVEs are
-            # specified.
             if include_vex != "all" and detail in (
                 "fixed-version",
                 "cpe-stable-backport",
@@ -692,7 +793,6 @@ def create_recipe_spdx(d):
 
 
 def load_recipe_spdx(d):
-
     return oe.sbom30.find_root_obj_in_jsonld(
         d,
         "static",
@@ -717,10 +817,8 @@ def create_spdx(d):
 
     pn = d.getVar("PN")
     deploydir = Path(d.getVar("SPDXDEPLOY"))
-    deploy_dir_spdx = Path(d.getVar("DEPLOY_DIR_SPDX"))
     spdx_workdir = Path(d.getVar("SPDXWORK"))
     include_sources = d.getVar("SPDX_INCLUDE_SOURCES") == "1"
-    pkg_arch = d.getVar("SSTATE_PKGARCH")
     is_native = get_is_native(d)
 
     recipe, recipe_objset = load_recipe_spdx(d)
@@ -783,7 +881,6 @@ def create_spdx(d):
     dep_objsets, dep_builds = collect_dep_objsets(
         d, direct_deps, "builds", "build-", oe.spdx30.build_Build
     )
-
     if dep_builds:
         build_objset.new_scoped_relationship(
             [build],
@@ -919,9 +1016,7 @@ def create_spdx(d):
 
             # Add concluded license relationship if manually set
             # Only add when license analysis has been explicitly performed
-            concluded_license_str = d.getVar(
-                "SPDX_CONCLUDED_LICENSE:%s" % package
-            ) or d.getVar("SPDX_CONCLUDED_LICENSE")
+            concluded_license_str = d.getVar("SPDX_CONCLUDED_LICENSE:%s" % package) or d.getVar("SPDX_CONCLUDED_LICENSE")
             if concluded_license_str:
                 concluded_spdx_license = add_license_expression(
                     d, build_objset, concluded_license_str, license_data
@@ -1011,13 +1106,12 @@ def create_spdx(d):
                 status = "enabled" if feature in enabled else "disabled"
                 build.build_parameter.append(
                     oe.spdx30.DictionaryEntry(
-                        key=f"PACKAGECONFIG:{feature}", value=status
+                        key=f"PACKAGECONFIG:{feature}",
+                        value=status
                     )
                 )
 
-            bb.note(
-                f"Added PACKAGECONFIG entries: {len(enabled)} enabled, {len(disabled)} disabled"
-            )
+            bb.note(f"Added PACKAGECONFIG entries: {len(enabled)} enabled, {len(disabled)} disabled")
 
     oe.sbom30.write_recipe_jsonld_doc(d, build_objset, "builds", deploydir)
 
@@ -1025,9 +1119,7 @@ def create_spdx(d):
 def create_package_spdx(d):
     deploy_dir_spdx = Path(d.getVar("DEPLOY_DIR_SPDX"))
     deploydir = Path(d.getVar("SPDXRUNTIMEDEPLOY"))
-
     direct_deps = oe.spdx_common.collect_direct_deps(d, "do_create_spdx")
-
     providers = oe.spdx_common.collect_package_providers(d, direct_deps)
     pkg_arch = d.getVar("SSTATE_PKGARCH")
 
@@ -1205,15 +1297,15 @@ def write_bitbake_spdx(d):
 def collect_build_package_inputs(d, objset, build, packages, files_by_hash=None):
     import oe.sbom30
 
-    direct_deps = oe.spdx_common.collect_direct_deps(d, "do_create_spdx")
-
+    direct_deps = oe.spdx_common.collect_direct_deps(d, "do_create_package_spdx")
     providers = oe.spdx_common.collect_package_providers(d, direct_deps)
 
     build_deps = set()
+    missing_providers = set()
 
     for name in sorted(packages.keys()):
         if name not in providers:
-            bb.note(f"Unable to find SPDX provider for '{name}'")
+            missing_providers.add(name)
             continue
 
         pkg_name, pkg_hashfn = providers[name]
@@ -1232,6 +1324,11 @@ def collect_build_package_inputs(d, objset, build, packages, files_by_hash=None)
             for h, f in pkg_objset.by_sha256_hash.items():
                 files_by_hash.setdefault(h, set()).update(f)
 
+    if missing_providers:
+        bb.fatal(
+            f"Unable to find SPDX provider(s) for: {', '.join(sorted(missing_providers))}"
+        )
+
     if build_deps:
         objset.new_scoped_relationship(
             [build],
@@ -1390,6 +1487,7 @@ def create_image_spdx(d):
 
                 set_timestamp_now(d, a, "builtTime")
 
+
         if artifacts:
             objset.new_scoped_relationship(
                 [image_build],
@@ -1583,10 +1681,3 @@ def create_sdk_sbom(d, sdk_deploydir, spdx_work_dir, toolchain_outputname):
     oe.sbom30.write_jsonld_doc(
         d, objset, sdk_deploydir / (toolchain_outputname + ".spdx.json")
     )
-    sbom_name = d.getVar("SPDX_RECIPE_SBOM_NAME")
-
-    recipe, recipe_objset = load_recipe_spdx(d)
-
-    objset, sbom = oe.sbom30.create_sbom(d, sbom_name, [recipe], [recipe_objset])
-
-    oe.sbom30.write_jsonld_doc(d, objset, deploydir / (sbom_name + ".spdx.json"))
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v13 4/4] oeqa/selftest: Add tests for source download enrichment
  2026-03-23 21:07 [OE-core][PATCH v13 0/4] SPDX 3.0 SBOM enrichment and compliance improvements Stefano Tondo
                   ` (2 preceding siblings ...)
  2026-03-23 21:07 ` [PATCH v13 3/4] spdx30: Enrich source downloads with version and PURL Stefano Tondo
@ 2026-03-23 21:07 ` Stefano Tondo
  2026-03-24 10:26   ` Richard Purdie
  2026-03-24 14:48   ` Joshua Watt
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
  4 siblings, 2 replies; 32+ messages in thread
From: Stefano Tondo @ 2026-03-23 21:07 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Add comprehensive tests for the new source download SPDX features:

test_download_location_defensive_handling:
  Verify that packages with no download location (e.g. packagegroups,
  images, virtual providers) are handled gracefully without crashing
  the SPDX generation pipeline.

test_version_extraction_patterns:
  Verify that Git source packages get SRCREV as their version in the
  SPDX output, rather than the recipe PV.

test_packageconfig_spdx:
  Verify that PACKAGECONFIG features are correctly recorded in SPDX
  build parameters when SPDX_INCLUDE_PACKAGECONFIG is enabled.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/lib/oeqa/selftest/cases/spdx.py | 104 +++++++++++++++++++++------
 1 file changed, 83 insertions(+), 21 deletions(-)

diff --git a/meta/lib/oeqa/selftest/cases/spdx.py b/meta/lib/oeqa/selftest/cases/spdx.py
index af1144c1e5..140d3debba 100644
--- a/meta/lib/oeqa/selftest/cases/spdx.py
+++ b/meta/lib/oeqa/selftest/cases/spdx.py
@@ -141,29 +141,15 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
     SPDX_CLASS = "create-spdx-3.0"
 
     def test_base_files(self):
-        self.check_recipe_spdx(
-            "base-files",
-            "{DEPLOY_DIR_SPDX}/{MACHINE_ARCH}/static/static-base-files.spdx.json",
-            task="create_recipe_spdx",
-        )
         self.check_recipe_spdx(
             "base-files",
             "{DEPLOY_DIR_SPDX}/{MACHINE_ARCH}/packages/package-base-files.spdx.json",
         )
 
-    def test_world_sbom(self):
-        objset = self.check_recipe_spdx(
-            "meta-world-recipe-sbom",
-            "{DEPLOY_DIR_IMAGE}/world-recipe-sbom.spdx.json",
-        )
-
-        # Document should be fully linked
-        self.check_objset_missing_ids(objset)
-
     def test_gcc_include_source(self):
         objset = self.check_recipe_spdx(
             "gcc",
-            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-gcc.spdx.json",
+            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/recipes/recipe-gcc.spdx.json",
             extraconf="""\
                 SPDX_INCLUDE_SOURCES = "1"
                 """,
@@ -176,12 +162,12 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
             if software_file.name == filename:
                 found = True
                 self.logger.info(
-                    f"The spdxId of {filename} in build-gcc.spdx.json is {software_file.spdxId}"
+                    f"The spdxId of {filename} in recipe-gcc.spdx.json is {software_file.spdxId}"
                 )
                 break
 
         self.assertTrue(
-            found, f"Not found source file {filename} in build-gcc.spdx.json\n"
+            found, f"Not found source file {filename} in recipe-gcc.spdx.json\n"
         )
 
     def test_core_image_minimal(self):
@@ -319,7 +305,7 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
         # This will fail with NameError if new_annotation() is called incorrectly
         objset = self.check_recipe_spdx(
             "base-files",
-            "{DEPLOY_DIR_SPDX}/{MACHINE_ARCH}/builds/build-base-files.spdx.json",
+            "{DEPLOY_DIR_SPDX}/{MACHINE_ARCH}/recipes/recipe-base-files.spdx.json",
             extraconf=textwrap.dedent(
                 f"""\
                 ANNOTATION1 = "{ANNOTATION_VAR1}"
@@ -374,8 +360,8 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
 
     def test_kernel_config_spdx(self):
         kernel_recipe = get_bb_var("PREFERRED_PROVIDER_virtual/kernel")
-        spdx_file = f"build-{kernel_recipe}.spdx.json"
-        spdx_path = f"{{DEPLOY_DIR_SPDX}}/{{SSTATE_PKGARCH}}/builds/{spdx_file}"
+        spdx_file = f"recipe-{kernel_recipe}.spdx.json"
+        spdx_path = f"{{DEPLOY_DIR_SPDX}}/{{SSTATE_PKGARCH}}/recipes/{spdx_file}"
 
         # Make sure kernel is configured first
         bitbake(f"-c configure {kernel_recipe}")
@@ -383,7 +369,7 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
         objset = self.check_recipe_spdx(
             kernel_recipe,
             spdx_path,
-            task="do_create_spdx",
+            task="do_create_kernel_config_spdx",
             extraconf="""\
                 INHERIT += "create-spdx"
                 SPDX_INCLUDE_KERNEL_CONFIG = "1"
@@ -428,3 +414,79 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
                 value, ["enabled", "disabled"],
                 f"Unexpected PACKAGECONFIG value '{value}' for {key}"
             )
+
+    def test_download_location_defensive_handling(self):
+        """Test that download_location handling is defensive.
+
+        Verifies SPDX generation succeeds and external references are
+        properly structured when download_location retrieval works.
+        """
+        objset = self.check_recipe_spdx(
+            "m4",
+            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-m4.spdx.json",
+        )
+
+        found_external_refs = False
+        for pkg in objset.foreach_type(oe.spdx30.software_Package):
+            if pkg.externalRef:
+                found_external_refs = True
+                for ref in pkg.externalRef:
+                    self.assertIsNotNone(ref.externalRefType)
+                    self.assertIsNotNone(ref.locator)
+                    self.assertGreater(len(ref.locator), 0, "Locator should have at least one entry")
+                    for loc in ref.locator:
+                        self.assertIsInstance(loc, str)
+                break
+
+        self.logger.info(
+            f"External references {'found' if found_external_refs else 'not found'} "
+            f"in SPDX output (defensive handling verified)"
+        )
+
+    def test_version_extraction_patterns(self):
+        """Test that version extraction works for various package formats.
+
+        Verifies that Git source downloads carry extracted versions and that
+        the reported version strings are well-formed.
+        """
+        objset = self.check_recipe_spdx(
+            "opkg-utils",
+            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-opkg-utils.spdx.json",
+        )
+
+        # Collect all packages with versions
+        packages_with_versions = []
+        for pkg in objset.foreach_type(oe.spdx30.software_Package):
+            if pkg.software_packageVersion:
+                packages_with_versions.append((pkg.name, pkg.software_packageVersion))
+
+        self.assertGreater(
+            len(packages_with_versions), 0,
+            "Should find packages with extracted versions"
+        )
+
+        for name, version in packages_with_versions:
+            self.assertRegex(
+                version,
+                r"^[0-9a-f]{40}$",
+                f"Expected Git source version for {name} to be a full SHA-1",
+            )
+
+        self.logger.info(f"Found {len(packages_with_versions)} packages with versions")
+
+        # Log some examples for debugging
+        for name, version in packages_with_versions[:5]:
+            self.logger.info(f"  {name}: {version}")
+
+        # Verify that versions follow expected patterns
+        for name, version in packages_with_versions:
+            # Version should not be empty
+            self.assertIsNotNone(version)
+            self.assertNotEqual(version, "")
+
+            # Version should contain digits
+            self.assertRegex(
+                version,
+                r'\d',
+                f"Version '{version}' for package '{name}' should contain digits"
+            )
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v13 4/4] oeqa/selftest: Add tests for source download enrichment
  2026-03-23 21:07 ` [PATCH v13 4/4] oeqa/selftest: Add tests for source download enrichment Stefano Tondo
@ 2026-03-24 10:26   ` Richard Purdie
  2026-03-24 14:48   ` Joshua Watt
  1 sibling, 0 replies; 32+ messages in thread
From: Richard Purdie @ 2026-03-24 10:26 UTC (permalink / raw)
  To: Stefano Tondo, openembedded-core
  Cc: ross.burton, jpewhacker, stefano.tondo.ext, peter.marko,
	adrian.freihofer, mathieu.dubois-briand

On Mon, 2026-03-23 at 22:07 +0100, Stefano Tondo wrote:
> Add comprehensive tests for the new source download SPDX features:
> 
> test_download_location_defensive_handling:
>   Verify that packages with no download location (e.g. packagegroups,
>   images, virtual providers) are handled gracefully without crashing
>   the SPDX generation pipeline.
> 
> test_version_extraction_patterns:
>   Verify that Git source packages get SRCREV as their version in the
>   SPDX output, rather than the recipe PV.
> 
> test_packageconfig_spdx:
>   Verify that PACKAGECONFIG features are correctly recorded in SPDX
>   build parameters when SPDX_INCLUDE_PACKAGECONFIG is enabled.
> 
> Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
> ---
>  meta/lib/oeqa/selftest/cases/spdx.py | 104 +++++++++++++++++++++------
>  1 file changed, 83 insertions(+), 21 deletions(-)

This seems to be breaking some of the existing tests:

https://autobuilder.yoctoproject.org/valkyrie/#/builders/23/builds/3602
https://autobuilder.yoctoproject.org/valkyrie/#/builders/48/builds/3385
https://autobuilder.yoctoproject.org/valkyrie/#/builders/35/builds/3498

Cheers,

Richard


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements
  2026-03-23 21:07 [OE-core][PATCH v13 0/4] SPDX 3.0 SBOM enrichment and compliance improvements Stefano Tondo
                   ` (3 preceding siblings ...)
  2026-03-23 21:07 ` [PATCH v13 4/4] oeqa/selftest: Add tests for source download enrichment Stefano Tondo
@ 2026-03-24 13:29 ` stondo
  2026-03-24 13:29   ` [OE-core][PATCH v14 1/4] spdx30: Add configurable file exclusion pattern support stondo
                     ` (15 more replies)
  4 siblings, 16 replies; 32+ messages in thread
From: stondo @ 2026-03-24 13:29 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

From: Stefano Tondo <stefano.tondo.ext@siemens.com>

This series enhances SPDX 3.0 SBOM generation with enriched
metadata and compliance-oriented controls for current master.

Changes since v13:

  - Fixed patch 4/4: reverted incorrect modifications to existing SPDX
    selftests that broke test_custom_annotation_vars,
    test_gcc_include_source, and test_kernel_config_spdx on the
    autobuilder (wrong SPDX output paths and task names).
    Patch 4 now only appends two new test methods without touching any
    existing upstream tests.
  - Patches 1-3 are unchanged from v13.

Validated with:

  oe-selftest -r \
    spdx.SPDX30Check.test_download_location_defensive_handling \
    spdx.SPDX30Check.test_version_extraction_patterns

Stefano Tondo (4):
  spdx30: Add configurable file exclusion pattern support
  spdx30: Add supplier support for image and SDK SBOMs
  spdx30: Enrich source downloads with version and PURL
  oeqa/selftest: Add tests for source download enrichment

 meta/classes-recipe/cargo_common.bbclass |   3 +
 meta/classes-recipe/cpan.bbclass         |  11 +
 meta/classes-recipe/go-mod.bbclass       |   6 +
 meta/classes-recipe/npm.bbclass          |   7 +
 meta/classes-recipe/pypi.bbclass         |   6 +-
 meta/classes/create-spdx-3.0.bbclass     |  17 ++
 meta/classes/spdx-common.bbclass         |   7 +
 meta/lib/oe/spdx30_tasks.py              | 278 +++++++++++++++++------
 meta/lib/oeqa/selftest/cases/spdx.py     |  76 +++++++
 9 files changed, 338 insertions(+), 73 deletions(-)

-- 
2.53.0



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [OE-core][PATCH v14 1/4] spdx30: Add configurable file exclusion pattern support
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
@ 2026-03-24 13:29   ` stondo
  2026-03-24 14:22     ` Joshua Watt
  2026-03-24 13:29   ` [OE-core][PATCH v14 2/4] spdx30: Add supplier support for image and SDK SBOMs stondo
                     ` (14 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: stondo @ 2026-03-24 13:29 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

From: Stefano Tondo <stefano.tondo.ext@siemens.com>

Add SPDX_FILE_EXCLUDE_PATTERNS variable that allows filtering files from
SPDX output by regex matching. The variable accepts a space-separated
list of Python regular expressions; files whose paths match any pattern
(via re.search) are excluded.

When empty (the default), no filtering is applied and all files are
included, preserving existing behavior.

This enables users to reduce SBOM size by excluding files that are not
relevant for compliance (e.g., test files, object files, patches).

Excluded files are tracked in a set returned from add_package_files()
and passed to get_package_sources_from_debug(), which uses the set for
precise cross-checking rather than re-evaluating patterns.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes/spdx-common.bbclass |  7 +++
 meta/lib/oe/spdx30_tasks.py      | 80 +++++++++++++++++++++-----------
 2 files changed, 60 insertions(+), 27 deletions(-)

diff --git a/meta/classes/spdx-common.bbclass b/meta/classes/spdx-common.bbclass
index 83f05579b6..40701730a6 100644
--- a/meta/classes/spdx-common.bbclass
+++ b/meta/classes/spdx-common.bbclass
@@ -82,6 +82,13 @@ SPDX_MULTILIB_SSTATE_ARCHS[doc] = "The list of sstate architectures to consider
     when collecting SPDX dependencies. This includes multilib architectures when \
     multilib is enabled. Defaults to SSTATE_ARCHS."
 
+SPDX_FILE_EXCLUDE_PATTERNS ??= ""
+SPDX_FILE_EXCLUDE_PATTERNS[doc] = "Space-separated list of Python regular \
+    expressions to exclude files from SPDX output. Files whose paths match \
+    any pattern (via re.search) will be filtered out. Defaults to empty \
+    (no filtering). Example: \
+    SPDX_FILE_EXCLUDE_PATTERNS = '\\.patch$ \\.diff$ /test/ \\.pyc$ \\.o$'"
+
 python () {
     from oe.cve_check import extend_cve_status
     extend_cve_status(d)
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 353d783fa2..68ed821a8c 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -13,6 +13,7 @@ import oe.spdx30
 import oe.spdx_common
 import oe.sdk
 import os
+import re
 
 from contextlib import contextmanager
 from datetime import datetime, timezone
@@ -157,17 +158,27 @@ def add_package_files(
     file_counter = 1
     if not os.path.exists(topdir):
         bb.note(f"Skip {topdir}")
-        return spdx_files
+        return spdx_files, set()
 
     check_compiled_sources = d.getVar("SPDX_INCLUDE_COMPILED_SOURCES") == "1"
     if check_compiled_sources:
         compiled_sources, types = oe.spdx_common.get_compiled_sources(d)
         bb.debug(1, f"Total compiled files: {len(compiled_sources)}")
 
+    exclude_patterns = [
+        re.compile(pattern)
+        for pattern in (d.getVar("SPDX_FILE_EXCLUDE_PATTERNS") or "").split()
+    ]
+    excluded_files = set()
+
     for subdir, dirs, files in os.walk(topdir, onerror=walk_error):
-        dirs[:] = [d for d in dirs if d not in ignore_dirs]
+        dirs[:] = [directory for directory in dirs if directory not in ignore_dirs]
         if subdir == str(topdir):
-            dirs[:] = [d for d in dirs if d not in ignore_top_level_dirs]
+            dirs[:] = [
+                directory
+                for directory in dirs
+                if directory not in ignore_top_level_dirs
+            ]
 
         dirs.sort()
         files.sort()
@@ -177,14 +188,19 @@ def add_package_files(
                 continue
 
             filename = str(filepath.relative_to(topdir))
+
+            if exclude_patterns and any(
+                pattern.search(filename) for pattern in exclude_patterns
+            ):
+                excluded_files.add(filename)
+                continue
+
             file_purposes = get_purposes(filepath)
 
-            # Check if file is compiled
-            if check_compiled_sources:
-                if not oe.spdx_common.is_compiled_source(
-                    filename, compiled_sources, types
-                ):
-                    continue
+            if check_compiled_sources and not oe.spdx_common.is_compiled_source(
+                filename, compiled_sources, types
+            ):
+                continue
 
             spdx_file = objset.new_file(
                 get_spdxid(file_counter),
@@ -218,12 +234,15 @@ def add_package_files(
 
     bb.debug(1, "Added %d files to %s" % (len(spdx_files), objset.doc._id))
 
-    return spdx_files
+    return spdx_files, excluded_files
 
 
 def get_package_sources_from_debug(
-    d, package, package_files, sources, source_hash_cache
+    d, package, package_files, sources, source_hash_cache, excluded_files=None
 ):
+    if excluded_files is None:
+        excluded_files = set()
+
     def file_path_match(file_path, pkg_file):
         if file_path.lstrip("/") == pkg_file.name.lstrip("/"):
             return True
@@ -256,6 +275,12 @@ def get_package_sources_from_debug(
             continue
 
         if not any(file_path_match(file_path, pkg_file) for pkg_file in package_files):
+            if file_path.lstrip("/") in excluded_files:
+                bb.debug(
+                    1,
+                    f"Skipping debug source lookup for excluded file {file_path} in {package}",
+                )
+                continue
             bb.fatal(
                 "No package file found for %s in %s; SPDX found: %s"
                 % (str(file_path), package, " ".join(p.name for p in package_files))
@@ -737,7 +762,7 @@ def create_spdx(d):
         bb.debug(1, "Adding source files to SPDX")
         oe.spdx_common.get_patched_src(d)
 
-        files = add_package_files(
+        files, _ = add_package_files(
             d,
             build_objset,
             spdx_workdir,
@@ -909,7 +934,7 @@ def create_spdx(d):
                 )
 
             bb.debug(1, "Adding package files to SPDX for package %s" % pkg_name)
-            package_files = add_package_files(
+            package_files, excluded_files = add_package_files(
                 d,
                 pkg_objset,
                 pkgdest / package,
@@ -932,7 +957,8 @@ def create_spdx(d):
 
             if include_sources:
                 debug_sources = get_package_sources_from_debug(
-                    d, package, package_files, dep_sources, source_hash_cache
+                    d, package, package_files, dep_sources, source_hash_cache,
+                    excluded_files=excluded_files,
                 )
                 debug_source_ids |= set(
                     oe.sbom30.get_element_link_id(d) for d in debug_sources
@@ -944,7 +970,7 @@ def create_spdx(d):
 
     if include_sources:
         bb.debug(1, "Adding sysroot files to SPDX")
-        sysroot_files = add_package_files(
+        sysroot_files, _ = add_package_files(
             d,
             build_objset,
             d.expand("${COMPONENTS_DIR}/${PACKAGE_ARCH}/${PN}"),
@@ -1326,18 +1352,18 @@ def create_image_spdx(d):
             image_filename = image["filename"]
             image_path = image_deploy_dir / image_filename
             if os.path.isdir(image_path):
-                a = add_package_files(
-                    d,
-                    objset,
-                    image_path,
-                    lambda file_counter: objset.new_spdxid(
-                        "imagefile", str(file_counter)
-                    ),
-                    lambda filepath: [],
-                    license_data=None,
-                    ignore_dirs=[],
-                    ignore_top_level_dirs=[],
-                    archive=None,
+                a, _ = add_package_files(
+                        d,
+                        objset,
+                        image_path,
+                        lambda file_counter: objset.new_spdxid(
+                            "imagefile", str(file_counter)
+                        ),
+                        lambda filepath: [],
+                        license_data=None,
+                        ignore_dirs=[],
+                        ignore_top_level_dirs=[],
+                        archive=None,
                 )
                 artifacts.extend(a)
             else:
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [OE-core][PATCH v14 2/4] spdx30: Add supplier support for image and SDK SBOMs
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
  2026-03-24 13:29   ` [OE-core][PATCH v14 1/4] spdx30: Add configurable file exclusion pattern support stondo
@ 2026-03-24 13:29   ` stondo
  2026-03-24 14:24     ` Joshua Watt
  2026-03-24 13:29   ` [OE-core][PATCH v14 3/4] spdx30: Enrich source downloads with version and PURL stondo
                     ` (13 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: stondo @ 2026-03-24 13:29 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand, Joshua Watt

From: Stefano Tondo <stefano.tondo.ext@siemens.com>

Add SPDX_IMAGE_SUPPLIER and SPDX_SDK_SUPPLIER variables that allow
setting a supplier agent on image and SDK SBOM root elements using
the suppliedBy property.

These follow the existing SPDX_PACKAGE_SUPPLIER pattern and use the
standard agent variable system to define supplier information.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
Reviewed-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/create-spdx-3.0.bbclass | 10 ++++++++++
 meta/lib/oe/spdx30_tasks.py          | 23 ++++++++++++++++++++---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
index 7515f460c3..9a6606dce6 100644
--- a/meta/classes/create-spdx-3.0.bbclass
+++ b/meta/classes/create-spdx-3.0.bbclass
@@ -124,6 +124,16 @@ SPDX_ON_BEHALF_OF[doc] = "The base variable name to describe the Agent on who's
 SPDX_PACKAGE_SUPPLIER[doc] = "The base variable name to describe the Agent who \
     is supplying artifacts produced by the build"
 
+SPDX_IMAGE_SUPPLIER[doc] = "The base variable name to describe the Agent who \
+    is supplying the image SBOM. The supplier will be set on all root elements \
+    of the image SBOM using the suppliedBy property. If not set, no supplier \
+    information will be added to the image SBOM."
+
+SPDX_SDK_SUPPLIER[doc] = "The base variable name to describe the Agent who \
+    is supplying the SDK SBOM. The supplier will be set on all root elements \
+    of the SDK SBOM using the suppliedBy property. If not set, no supplier \
+    information will be added to the SDK SBOM."
+
 SPDX_PACKAGE_VERSION ??= "${PV}"
 SPDX_PACKAGE_VERSION[doc] = "The version of a package, software_packageVersion \
     in software_Package"
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 68ed821a8c..62a00069df 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -1449,6 +1449,16 @@ def create_image_sbom_spdx(d):
 
     objset, sbom = oe.sbom30.create_sbom(d, image_name, root_elements)
 
+    # Set supplier on root elements if SPDX_IMAGE_SUPPLIER is defined
+    supplier = objset.new_agent("SPDX_IMAGE_SUPPLIER", add=False)
+    if supplier is not None:
+        supplier_id = supplier if isinstance(supplier, str) else supplier._id
+        if not isinstance(supplier, str):
+            objset.add(supplier)
+        for elem in sbom.rootElement:
+            if hasattr(elem, "suppliedBy"):
+                elem.suppliedBy = supplier_id
+
     oe.sbom30.write_jsonld_doc(d, objset, spdx_path)
 
     def make_image_link(target_path, suffix):
@@ -1560,12 +1570,19 @@ def create_sdk_sbom(d, sdk_deploydir, spdx_work_dir, toolchain_outputname):
         d, toolchain_outputname, sorted(list(files)), [rootfs_objset]
     )
 
+    # Set supplier on root elements if SPDX_SDK_SUPPLIER is defined
+    supplier = objset.new_agent("SPDX_SDK_SUPPLIER", add=False)
+    if supplier is not None:
+        supplier_id = supplier if isinstance(supplier, str) else supplier._id
+        if not isinstance(supplier, str):
+            objset.add(supplier)
+        for elem in sbom.rootElement:
+            if hasattr(elem, "suppliedBy"):
+                elem.suppliedBy = supplier_id
+
     oe.sbom30.write_jsonld_doc(
         d, objset, sdk_deploydir / (toolchain_outputname + ".spdx.json")
     )
-
-
-def create_recipe_sbom(d, deploydir):
     sbom_name = d.getVar("SPDX_RECIPE_SBOM_NAME")
 
     recipe, recipe_objset = load_recipe_spdx(d)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [OE-core][PATCH v14 3/4] spdx30: Enrich source downloads with version and PURL
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
  2026-03-24 13:29   ` [OE-core][PATCH v14 1/4] spdx30: Add configurable file exclusion pattern support stondo
  2026-03-24 13:29   ` [OE-core][PATCH v14 2/4] spdx30: Add supplier support for image and SDK SBOMs stondo
@ 2026-03-24 13:29   ` stondo
  2026-03-24 14:46     ` Joshua Watt
  2026-03-24 13:29   ` [OE-core][PATCH v14 4/4] oeqa/selftest: Add tests for source download enrichment stondo
                     ` (12 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: stondo @ 2026-03-24 13:29 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

From: Stefano Tondo <stefano.tondo.ext@siemens.com>

Add version extraction, PURL generation, and external references
to source download packages in SPDX 3.0 SBOMs:

- Extract version from SRCREV for Git sources (full SHA-1)
- Generate PURLs for Git sources on github.com by default
- Support custom mappings via SPDX_GIT_PURL_MAPPINGS variable
  (format: "domain:purl_type", split(':', 1) for parsing)
- Use ecosystem PURLs from SPDX_PACKAGE_URLS for non-Git
- Add VCS external references for Git downloads
- Add distribution external references for tarball downloads
- Parse Git URLs using urllib.parse
- Extract logic into _generate_git_purl() and
  _enrich_source_package() helpers

For non-Git sources, version is not set from PV since the recipe
version does not necessarily reflect the version of individual
downloaded files. Ecosystem PURLs (which include version) from
SPDX_PACKAGE_URLS are still used when available.

The SPDX_GIT_PURL_MAPPINGS variable allows configuring PURL
generation for self-hosted Git services (e.g., GitLab).
github.com is always mapped to pkg:github by default.

Add ecosystem-specific SPDX_PACKAGE_URLS to recipe classes:
- cargo_common.bbclass: pkg:cargo
- cpan.bbclass: pkg:cpan (with prefix stripping)
- go-mod.bbclass: pkg:golang
- npm.bbclass: pkg:npm (with prefix stripping)
- pypi.bbclass: pkg:pypi (with normalization)

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes-recipe/cargo_common.bbclass |   3 +
 meta/classes-recipe/cpan.bbclass         |  11 ++
 meta/classes-recipe/go-mod.bbclass       |   6 +
 meta/classes-recipe/npm.bbclass          |   7 +
 meta/classes-recipe/pypi.bbclass         |   6 +-
 meta/classes/create-spdx-3.0.bbclass     |   7 +
 meta/lib/oe/spdx30_tasks.py              | 175 +++++++++++++++++------
 7 files changed, 172 insertions(+), 43 deletions(-)

diff --git a/meta/classes-recipe/cargo_common.bbclass b/meta/classes-recipe/cargo_common.bbclass
index bc44ad7918..0d3edfe4a7 100644
--- a/meta/classes-recipe/cargo_common.bbclass
+++ b/meta/classes-recipe/cargo_common.bbclass
@@ -240,3 +240,6 @@ EXPORT_FUNCTIONS do_configure
 # https://github.com/rust-lang/libc/issues/3223
 # https://github.com/rust-lang/libc/pull/3175
 INSANE_SKIP:append = " 32bit-time"
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:cargo/${BPN}@${PV} "
diff --git a/meta/classes-recipe/cpan.bbclass b/meta/classes-recipe/cpan.bbclass
index bb76a5b326..dbf44da9d2 100644
--- a/meta/classes-recipe/cpan.bbclass
+++ b/meta/classes-recipe/cpan.bbclass
@@ -68,4 +68,15 @@ cpan_do_install () {
 	done
 }
 
+# Generate ecosystem-specific Package URL for SPDX
+def cpan_spdx_name(d):
+    bpn = d.getVar('BPN')
+    if bpn.startswith('perl-'):
+        return bpn[5:]
+    elif bpn.startswith('libperl-'):
+        return bpn[8:]
+    return bpn
+
+SPDX_PACKAGE_URLS =+ "pkg:cpan/${@cpan_spdx_name(d)}@${PV} "
+
 EXPORT_FUNCTIONS do_configure do_compile do_install
diff --git a/meta/classes-recipe/go-mod.bbclass b/meta/classes-recipe/go-mod.bbclass
index a15dda8f0e..5b3cb2d8b9 100644
--- a/meta/classes-recipe/go-mod.bbclass
+++ b/meta/classes-recipe/go-mod.bbclass
@@ -32,3 +32,9 @@ do_compile[dirs] += "${B}/src/${GO_WORKDIR}"
 # Make go install unpack the module zip files in the module cache directory
 # before the license directory is polulated with license files.
 addtask do_compile before do_populate_lic
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:golang/${GO_IMPORT}@${PV} "
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:golang/${GO_IMPORT}@${PV} "
diff --git a/meta/classes-recipe/npm.bbclass b/meta/classes-recipe/npm.bbclass
index 344e8b4bec..7bb791d543 100644
--- a/meta/classes-recipe/npm.bbclass
+++ b/meta/classes-recipe/npm.bbclass
@@ -354,4 +354,11 @@ FILES:${PN} += " \
     ${nonarch_libdir} \
 "
 
+# Generate ecosystem-specific Package URL for SPDX
+def npm_spdx_name(d):
+    bpn = d.getVar('BPN')
+    return bpn[5:] if bpn.startswith('node-') else bpn
+
+SPDX_PACKAGE_URLS =+ "pkg:npm/${@npm_spdx_name(d)}@${PV} "
+
 EXPORT_FUNCTIONS do_configure do_compile do_install
diff --git a/meta/classes-recipe/pypi.bbclass b/meta/classes-recipe/pypi.bbclass
index 9d46c035f6..e2d054af6d 100644
--- a/meta/classes-recipe/pypi.bbclass
+++ b/meta/classes-recipe/pypi.bbclass
@@ -43,7 +43,8 @@ SECTION = "devel/python"
 SRC_URI:prepend = "${PYPI_SRC_URI} "
 S = "${UNPACKDIR}/${PYPI_PACKAGE}-${PV}"
 
-UPSTREAM_CHECK_PYPI_PACKAGE ?= "${PYPI_PACKAGE}"
+# Replace any '_' characters in the pypi URI with '-'s to follow the PyPi website naming conventions
+UPSTREAM_CHECK_PYPI_PACKAGE ?= "${@pypi_normalize(d)}"
 
 # Use the simple repository API rather than the potentially unstable project URL
 # More information on the pypi API specification is avaialble here:
@@ -54,3 +55,6 @@ UPSTREAM_CHECK_URI ?= "https://pypi.org/simple/${@pypi_normalize(d)}/"
 UPSTREAM_CHECK_REGEX ?= "${UPSTREAM_CHECK_PYPI_PACKAGE}-(?P<pver>(\d+[\.\-_]*)+).(tar\.gz|tgz|zip|tar\.bz2)"
 
 CVE_PRODUCT ?= "python:${PYPI_PACKAGE}"
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:pypi/${@pypi_normalize(d)}@${PV} "
diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
index 9a6606dce6..265dc525bc 100644
--- a/meta/classes/create-spdx-3.0.bbclass
+++ b/meta/classes/create-spdx-3.0.bbclass
@@ -156,6 +156,13 @@ SPDX_RECIPE_SBOM_NAME ?= "${PN}-recipe-sbom"
 SPDX_RECIPE_SBOM_NAME[doc] = "The name of output recipe SBoM when using \
     create_recipe_sbom"
 
+SPDX_GIT_PURL_MAPPINGS ??= ""
+SPDX_GIT_PURL_MAPPINGS[doc] = "A space separated list of domain:purl_type \
+    mappings to configure PURL generation for Git source downloads. \
+    For example, "gitlab.example.com:pkg:gitlab" maps repositories hosted \
+    on gitlab.example.com to the pkg:gitlab PURL type. \
+    github.com is always mapped to pkg:github by default."
+
 IMAGE_CLASSES:append = " create-spdx-image-3.0"
 SDK_CLASSES += "create-spdx-sdk-3.0"
 
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 62a00069df..6f0bdba975 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -14,6 +14,7 @@ import oe.spdx_common
 import oe.sdk
 import os
 import re
+import urllib.parse
 
 from contextlib import contextmanager
 from datetime import datetime, timezone
@@ -384,6 +385,120 @@ def collect_dep_sources(dep_objsets, dest):
             index_sources_by_hash(e.to, dest)
 
 
+def _generate_git_purl(d, download_location, srcrev):
+    """Generate a Package URL for a Git source from its download location.
+
+    Parses the Git URL to identify the hosting service and generates the
+    appropriate PURL type. Supports github.com by default and custom
+    mappings via SPDX_GIT_PURL_MAPPINGS.
+
+    Returns the PURL string or None if no mapping matches.
+    """
+    if not download_location or not download_location.startswith('git+'):
+        return None
+
+    git_url = download_location[4:]  # Remove 'git+' prefix
+
+    # Default handler: github.com
+    git_purl_handlers = {
+        'github.com': 'pkg:github',
+    }
+
+    # Custom PURL mappings from SPDX_GIT_PURL_MAPPINGS
+    # Format: "domain1:purl_type1 domain2:purl_type2"
+    custom_mappings = d.getVar('SPDX_GIT_PURL_MAPPINGS')
+    if custom_mappings:
+        for mapping in custom_mappings.split():
+            parts = mapping.split(':', 1)
+            if len(parts) == 2:
+                git_purl_handlers[parts[0]] = parts[1]
+                bb.debug(2, f"Added custom Git PURL mapping: {parts[0]} -> {parts[1]}")
+            else:
+                bb.warn(f"Invalid SPDX_GIT_PURL_MAPPINGS entry: {mapping} (expected format: domain:purl_type)")
+
+    try:
+        parsed = urllib.parse.urlparse(git_url)
+    except Exception:
+        return None
+
+    hostname = parsed.hostname
+    if not hostname:
+        return None
+
+    for domain, purl_type in git_purl_handlers.items():
+        if hostname == domain:
+            path = parsed.path.strip('/')
+            path_parts = path.split('/')
+            if len(path_parts) >= 2:
+                owner = path_parts[0]
+                repo = path_parts[1].replace('.git', '')
+                return f"{purl_type}/{owner}/{repo}@{srcrev}"
+            break
+
+    return None
+
+
+def _enrich_source_package(d, dl, fd, file_name, primary_purpose):
+    """Enrich a source download package with version, PURL, and external refs.
+
+    Extracts version from SRCREV for Git sources, generates PURLs for
+    known hosting services, and adds external references for VCS,
+    distribution URLs, and homepage.
+    """
+    version = None
+    purl = None
+
+    if fd.type == "git":
+        # Use full SHA-1 from fd.revision
+        srcrev = getattr(fd, 'revision', None)
+        if srcrev and srcrev not in {'${AUTOREV}', 'AUTOINC', 'INVALID'}:
+            version = srcrev
+
+        # Generate PURL for Git hosting services
+        download_location = getattr(dl, 'software_downloadLocation', None)
+        if version and download_location:
+            purl = _generate_git_purl(d, download_location, version)
+    else:
+        # Use ecosystem PURL from SPDX_PACKAGE_URLS if available
+        package_urls = (d.getVar('SPDX_PACKAGE_URLS') or '').split()
+        for url in package_urls:
+            if not url.startswith('pkg:yocto'):
+                purl = url
+                break
+
+    if version:
+        dl.software_packageVersion = version
+
+    if purl:
+        dl.software_packageUrl = purl
+
+    # Add external references
+    download_location = getattr(dl, 'software_downloadLocation', None)
+    if download_location and isinstance(download_location, str):
+        dl.externalRef = dl.externalRef or []
+
+        if download_location.startswith('git+'):
+            # VCS reference for Git repositories
+            git_url = download_location[4:]
+            if '@' in git_url:
+                git_url = git_url.split('@')[0]
+
+            dl.externalRef.append(
+                oe.spdx30.ExternalRef(
+                    externalRefType=oe.spdx30.ExternalRefType.vcs,
+                    locator=[git_url],
+                )
+            )
+        elif download_location.startswith(('http://', 'https://', 'ftp://')):
+            # Distribution reference for tarball/archive downloads
+            dl.externalRef.append(
+                oe.spdx30.ExternalRef(
+                    externalRefType=oe.spdx30.ExternalRefType.altDownloadLocation,
+                    locator=[download_location],
+                )
+            )
+
+
 def add_download_files(d, objset):
     inputs = set()
 
@@ -447,10 +562,14 @@ def add_download_files(d, objset):
                 )
             )
 
+            _enrich_source_package(d, dl, fd, file_name, primary_purpose)
+
             if fd.method.supports_checksum(fd):
                 # TODO Need something better than hard coding this
                 for checksum_id in ["sha256", "sha1"]:
-                    expected_checksum = getattr(fd, "%s_expected" % checksum_id, None)
+                    expected_checksum = getattr(
+                        fd, "%s_expected" % checksum_id, None
+                    )
                     if expected_checksum is None:
                         continue
 
@@ -506,7 +625,6 @@ def get_is_native(d):
 
 def create_recipe_spdx(d):
     deploydir = Path(d.getVar("SPDXRECIPEDEPLOY"))
-    deploy_dir_spdx = Path(d.getVar("DEPLOY_DIR_SPDX"))
     pn = d.getVar("PN")
 
     license_data = oe.spdx_common.load_spdx_license_data(d)
@@ -541,20 +659,6 @@ def create_recipe_spdx(d):
 
     set_purls(recipe, (d.getVar("SPDX_PACKAGE_URLS") or "").split())
 
-    # TODO: This doesn't work before do_unpack because the license text has to
-    # be available for recipes with NO_GENERIC_LICENSE
-    # recipe_spdx_license = add_license_expression(
-    #    d,
-    #    recipe_objset,
-    #    d.getVar("LICENSE"),
-    #    license_data,
-    # )
-    # recipe_objset.new_relationship(
-    #    [recipe],
-    #    oe.spdx30.RelationshipType.hasDeclaredLicense,
-    #    [oe.sbom30.get_element_link_id(recipe_spdx_license)],
-    # )
-
     if val := d.getVar("HOMEPAGE"):
         recipe.software_homePage = val
 
@@ -588,7 +692,6 @@ def create_recipe_spdx(d):
             sorted(oe.sbom30.get_element_link_id(dep) for dep in dep_recipes),
         )
 
-    # Add CVEs
     cve_by_status = {}
     if include_vex != "none":
         patched_cves = oe.cve_check.get_patched_cves(d)
@@ -598,8 +701,6 @@ def create_recipe_spdx(d):
             description = patched_cve.get("justification", None)
             resources = patched_cve.get("resource", [])
 
-            # If this CVE is fixed upstream, skip it unless all CVEs are
-            # specified.
             if include_vex != "all" and detail in (
                 "fixed-version",
                 "cpe-stable-backport",
@@ -692,7 +793,6 @@ def create_recipe_spdx(d):
 
 
 def load_recipe_spdx(d):
-
     return oe.sbom30.find_root_obj_in_jsonld(
         d,
         "static",
@@ -717,10 +817,8 @@ def create_spdx(d):
 
     pn = d.getVar("PN")
     deploydir = Path(d.getVar("SPDXDEPLOY"))
-    deploy_dir_spdx = Path(d.getVar("DEPLOY_DIR_SPDX"))
     spdx_workdir = Path(d.getVar("SPDXWORK"))
     include_sources = d.getVar("SPDX_INCLUDE_SOURCES") == "1"
-    pkg_arch = d.getVar("SSTATE_PKGARCH")
     is_native = get_is_native(d)
 
     recipe, recipe_objset = load_recipe_spdx(d)
@@ -783,7 +881,6 @@ def create_spdx(d):
     dep_objsets, dep_builds = collect_dep_objsets(
         d, direct_deps, "builds", "build-", oe.spdx30.build_Build
     )
-
     if dep_builds:
         build_objset.new_scoped_relationship(
             [build],
@@ -919,9 +1016,7 @@ def create_spdx(d):
 
             # Add concluded license relationship if manually set
             # Only add when license analysis has been explicitly performed
-            concluded_license_str = d.getVar(
-                "SPDX_CONCLUDED_LICENSE:%s" % package
-            ) or d.getVar("SPDX_CONCLUDED_LICENSE")
+            concluded_license_str = d.getVar("SPDX_CONCLUDED_LICENSE:%s" % package) or d.getVar("SPDX_CONCLUDED_LICENSE")
             if concluded_license_str:
                 concluded_spdx_license = add_license_expression(
                     d, build_objset, concluded_license_str, license_data
@@ -1011,13 +1106,12 @@ def create_spdx(d):
                 status = "enabled" if feature in enabled else "disabled"
                 build.build_parameter.append(
                     oe.spdx30.DictionaryEntry(
-                        key=f"PACKAGECONFIG:{feature}", value=status
+                        key=f"PACKAGECONFIG:{feature}",
+                        value=status
                     )
                 )
 
-            bb.note(
-                f"Added PACKAGECONFIG entries: {len(enabled)} enabled, {len(disabled)} disabled"
-            )
+            bb.note(f"Added PACKAGECONFIG entries: {len(enabled)} enabled, {len(disabled)} disabled")
 
     oe.sbom30.write_recipe_jsonld_doc(d, build_objset, "builds", deploydir)
 
@@ -1025,9 +1119,7 @@ def create_spdx(d):
 def create_package_spdx(d):
     deploy_dir_spdx = Path(d.getVar("DEPLOY_DIR_SPDX"))
     deploydir = Path(d.getVar("SPDXRUNTIMEDEPLOY"))
-
     direct_deps = oe.spdx_common.collect_direct_deps(d, "do_create_spdx")
-
     providers = oe.spdx_common.collect_package_providers(d, direct_deps)
     pkg_arch = d.getVar("SSTATE_PKGARCH")
 
@@ -1205,15 +1297,15 @@ def write_bitbake_spdx(d):
 def collect_build_package_inputs(d, objset, build, packages, files_by_hash=None):
     import oe.sbom30
 
-    direct_deps = oe.spdx_common.collect_direct_deps(d, "do_create_spdx")
-
+    direct_deps = oe.spdx_common.collect_direct_deps(d, "do_create_package_spdx")
     providers = oe.spdx_common.collect_package_providers(d, direct_deps)
 
     build_deps = set()
+    missing_providers = set()
 
     for name in sorted(packages.keys()):
         if name not in providers:
-            bb.note(f"Unable to find SPDX provider for '{name}'")
+            missing_providers.add(name)
             continue
 
         pkg_name, pkg_hashfn = providers[name]
@@ -1232,6 +1324,11 @@ def collect_build_package_inputs(d, objset, build, packages, files_by_hash=None)
             for h, f in pkg_objset.by_sha256_hash.items():
                 files_by_hash.setdefault(h, set()).update(f)
 
+    if missing_providers:
+        bb.fatal(
+            f"Unable to find SPDX provider(s) for: {', '.join(sorted(missing_providers))}"
+        )
+
     if build_deps:
         objset.new_scoped_relationship(
             [build],
@@ -1390,6 +1487,7 @@ def create_image_spdx(d):
 
                 set_timestamp_now(d, a, "builtTime")
 
+
         if artifacts:
             objset.new_scoped_relationship(
                 [image_build],
@@ -1583,10 +1681,3 @@ def create_sdk_sbom(d, sdk_deploydir, spdx_work_dir, toolchain_outputname):
     oe.sbom30.write_jsonld_doc(
         d, objset, sdk_deploydir / (toolchain_outputname + ".spdx.json")
     )
-    sbom_name = d.getVar("SPDX_RECIPE_SBOM_NAME")
-
-    recipe, recipe_objset = load_recipe_spdx(d)
-
-    objset, sbom = oe.sbom30.create_sbom(d, sbom_name, [recipe], [recipe_objset])
-
-    oe.sbom30.write_jsonld_doc(d, objset, deploydir / (sbom_name + ".spdx.json"))
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [OE-core][PATCH v14 4/4] oeqa/selftest: Add tests for source download enrichment
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (2 preceding siblings ...)
  2026-03-24 13:29   ` [OE-core][PATCH v14 3/4] spdx30: Enrich source downloads with version and PURL stondo
@ 2026-03-24 13:29   ` stondo
  2026-03-24 17:12   ` [PATCH v16 0/5] spdx30: PURL and " Stefano Tondo
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: stondo @ 2026-03-24 13:29 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

From: Stefano Tondo <stefano.tondo.ext@siemens.com>

Add two tests for the new source download SPDX features:

test_download_location_defensive_handling:
  Verify that packages with no download location (e.g. packagegroups,
  images, virtual providers) are handled gracefully without crashing
  the SPDX generation pipeline.

test_version_extraction_patterns:
  Verify that Git source packages get SRCREV as their version in the
  SPDX output, rather than the recipe PV.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/lib/oeqa/selftest/cases/spdx.py | 76 ++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/meta/lib/oeqa/selftest/cases/spdx.py b/meta/lib/oeqa/selftest/cases/spdx.py
index af1144c1e5..9347e0bf7b 100644
--- a/meta/lib/oeqa/selftest/cases/spdx.py
+++ b/meta/lib/oeqa/selftest/cases/spdx.py
@@ -428,3 +428,79 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
                 value, ["enabled", "disabled"],
                 f"Unexpected PACKAGECONFIG value '{value}' for {key}"
             )
+
+    def test_download_location_defensive_handling(self):
+        """Test that download_location handling is defensive.
+
+        Verifies SPDX generation succeeds and external references are
+        properly structured when download_location retrieval works.
+        """
+        objset = self.check_recipe_spdx(
+            "m4",
+            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-m4.spdx.json",
+        )
+
+        found_external_refs = False
+        for pkg in objset.foreach_type(oe.spdx30.software_Package):
+            if pkg.externalRef:
+                found_external_refs = True
+                for ref in pkg.externalRef:
+                    self.assertIsNotNone(ref.externalRefType)
+                    self.assertIsNotNone(ref.locator)
+                    self.assertGreater(len(ref.locator), 0, "Locator should have at least one entry")
+                    for loc in ref.locator:
+                        self.assertIsInstance(loc, str)
+                break
+
+        self.logger.info(
+            f"External references {'found' if found_external_refs else 'not found'} "
+            f"in SPDX output (defensive handling verified)"
+        )
+
+    def test_version_extraction_patterns(self):
+        """Test that version extraction works for various package formats.
+
+        Verifies that Git source downloads carry extracted versions and that
+        the reported version strings are well-formed.
+        """
+        objset = self.check_recipe_spdx(
+            "opkg-utils",
+            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-opkg-utils.spdx.json",
+        )
+
+        # Collect all packages with versions
+        packages_with_versions = []
+        for pkg in objset.foreach_type(oe.spdx30.software_Package):
+            if pkg.software_packageVersion:
+                packages_with_versions.append((pkg.name, pkg.software_packageVersion))
+
+        self.assertGreater(
+            len(packages_with_versions), 0,
+            "Should find packages with extracted versions"
+        )
+
+        for name, version in packages_with_versions:
+            self.assertRegex(
+                version,
+                r"^[0-9a-f]{40}$",
+                f"Expected Git source version for {name} to be a full SHA-1",
+            )
+
+        self.logger.info(f"Found {len(packages_with_versions)} packages with versions")
+
+        # Log some examples for debugging
+        for name, version in packages_with_versions[:5]:
+            self.logger.info(f"  {name}: {version}")
+
+        # Verify that versions follow expected patterns
+        for name, version in packages_with_versions:
+            # Version should not be empty
+            self.assertIsNotNone(version)
+            self.assertNotEqual(version, "")
+
+            # Version should contain digits
+            self.assertRegex(
+                version,
+                r'\d',
+                f"Version '{version}' for package '{name}' should contain digits"
+            )
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [OE-core][PATCH v14 1/4] spdx30: Add configurable file exclusion pattern support
  2026-03-24 13:29   ` [OE-core][PATCH v14 1/4] spdx30: Add configurable file exclusion pattern support stondo
@ 2026-03-24 14:22     ` Joshua Watt
  0 siblings, 0 replies; 32+ messages in thread
From: Joshua Watt @ 2026-03-24 14:22 UTC (permalink / raw)
  To: stondo
  Cc: openembedded-core, richard.purdie, ross.burton, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

On Tue, Mar 24, 2026 at 7:30 AM <stondo@gmail.com> wrote:
>
> From: Stefano Tondo <stefano.tondo.ext@siemens.com>
>
> Add SPDX_FILE_EXCLUDE_PATTERNS variable that allows filtering files from
> SPDX output by regex matching. The variable accepts a space-separated
> list of Python regular expressions; files whose paths match any pattern
> (via re.search) are excluded.
>
> When empty (the default), no filtering is applied and all files are
> included, preserving existing behavior.
>
> This enables users to reduce SBOM size by excluding files that are not
> relevant for compliance (e.g., test files, object files, patches).
>
> Excluded files are tracked in a set returned from add_package_files()
> and passed to get_package_sources_from_debug(), which uses the set for
> precise cross-checking rather than re-evaluating patterns.

LGTM, Thanks.

Reviewed-by: Joshua Watt <JPEWhacker@gmail.com>

>
> Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
> ---
>  meta/classes/spdx-common.bbclass |  7 +++
>  meta/lib/oe/spdx30_tasks.py      | 80 +++++++++++++++++++++-----------
>  2 files changed, 60 insertions(+), 27 deletions(-)
>
> diff --git a/meta/classes/spdx-common.bbclass b/meta/classes/spdx-common.bbclass
> index 83f05579b6..40701730a6 100644
> --- a/meta/classes/spdx-common.bbclass
> +++ b/meta/classes/spdx-common.bbclass
> @@ -82,6 +82,13 @@ SPDX_MULTILIB_SSTATE_ARCHS[doc] = "The list of sstate architectures to consider
>      when collecting SPDX dependencies. This includes multilib architectures when \
>      multilib is enabled. Defaults to SSTATE_ARCHS."
>
> +SPDX_FILE_EXCLUDE_PATTERNS ??= ""
> +SPDX_FILE_EXCLUDE_PATTERNS[doc] = "Space-separated list of Python regular \
> +    expressions to exclude files from SPDX output. Files whose paths match \
> +    any pattern (via re.search) will be filtered out. Defaults to empty \
> +    (no filtering). Example: \
> +    SPDX_FILE_EXCLUDE_PATTERNS = '\\.patch$ \\.diff$ /test/ \\.pyc$ \\.o$'"
> +
>  python () {
>      from oe.cve_check import extend_cve_status
>      extend_cve_status(d)
> diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
> index 353d783fa2..68ed821a8c 100644
> --- a/meta/lib/oe/spdx30_tasks.py
> +++ b/meta/lib/oe/spdx30_tasks.py
> @@ -13,6 +13,7 @@ import oe.spdx30
>  import oe.spdx_common
>  import oe.sdk
>  import os
> +import re
>
>  from contextlib import contextmanager
>  from datetime import datetime, timezone
> @@ -157,17 +158,27 @@ def add_package_files(
>      file_counter = 1
>      if not os.path.exists(topdir):
>          bb.note(f"Skip {topdir}")
> -        return spdx_files
> +        return spdx_files, set()
>
>      check_compiled_sources = d.getVar("SPDX_INCLUDE_COMPILED_SOURCES") == "1"
>      if check_compiled_sources:
>          compiled_sources, types = oe.spdx_common.get_compiled_sources(d)
>          bb.debug(1, f"Total compiled files: {len(compiled_sources)}")
>
> +    exclude_patterns = [
> +        re.compile(pattern)
> +        for pattern in (d.getVar("SPDX_FILE_EXCLUDE_PATTERNS") or "").split()
> +    ]
> +    excluded_files = set()
> +
>      for subdir, dirs, files in os.walk(topdir, onerror=walk_error):
> -        dirs[:] = [d for d in dirs if d not in ignore_dirs]
> +        dirs[:] = [directory for directory in dirs if directory not in ignore_dirs]
>          if subdir == str(topdir):
> -            dirs[:] = [d for d in dirs if d not in ignore_top_level_dirs]
> +            dirs[:] = [
> +                directory
> +                for directory in dirs
> +                if directory not in ignore_top_level_dirs
> +            ]
>
>          dirs.sort()
>          files.sort()
> @@ -177,14 +188,19 @@ def add_package_files(
>                  continue
>
>              filename = str(filepath.relative_to(topdir))
> +
> +            if exclude_patterns and any(
> +                pattern.search(filename) for pattern in exclude_patterns
> +            ):
> +                excluded_files.add(filename)
> +                continue
> +
>              file_purposes = get_purposes(filepath)
>
> -            # Check if file is compiled
> -            if check_compiled_sources:
> -                if not oe.spdx_common.is_compiled_source(
> -                    filename, compiled_sources, types
> -                ):
> -                    continue
> +            if check_compiled_sources and not oe.spdx_common.is_compiled_source(
> +                filename, compiled_sources, types
> +            ):
> +                continue
>
>              spdx_file = objset.new_file(
>                  get_spdxid(file_counter),
> @@ -218,12 +234,15 @@ def add_package_files(
>
>      bb.debug(1, "Added %d files to %s" % (len(spdx_files), objset.doc._id))
>
> -    return spdx_files
> +    return spdx_files, excluded_files
>
>
>  def get_package_sources_from_debug(
> -    d, package, package_files, sources, source_hash_cache
> +    d, package, package_files, sources, source_hash_cache, excluded_files=None
>  ):
> +    if excluded_files is None:
> +        excluded_files = set()
> +
>      def file_path_match(file_path, pkg_file):
>          if file_path.lstrip("/") == pkg_file.name.lstrip("/"):
>              return True
> @@ -256,6 +275,12 @@ def get_package_sources_from_debug(
>              continue
>
>          if not any(file_path_match(file_path, pkg_file) for pkg_file in package_files):
> +            if file_path.lstrip("/") in excluded_files:
> +                bb.debug(
> +                    1,
> +                    f"Skipping debug source lookup for excluded file {file_path} in {package}",
> +                )
> +                continue
>              bb.fatal(
>                  "No package file found for %s in %s; SPDX found: %s"
>                  % (str(file_path), package, " ".join(p.name for p in package_files))
> @@ -737,7 +762,7 @@ def create_spdx(d):
>          bb.debug(1, "Adding source files to SPDX")
>          oe.spdx_common.get_patched_src(d)
>
> -        files = add_package_files(
> +        files, _ = add_package_files(
>              d,
>              build_objset,
>              spdx_workdir,
> @@ -909,7 +934,7 @@ def create_spdx(d):
>                  )
>
>              bb.debug(1, "Adding package files to SPDX for package %s" % pkg_name)
> -            package_files = add_package_files(
> +            package_files, excluded_files = add_package_files(
>                  d,
>                  pkg_objset,
>                  pkgdest / package,
> @@ -932,7 +957,8 @@ def create_spdx(d):
>
>              if include_sources:
>                  debug_sources = get_package_sources_from_debug(
> -                    d, package, package_files, dep_sources, source_hash_cache
> +                    d, package, package_files, dep_sources, source_hash_cache,
> +                    excluded_files=excluded_files,
>                  )
>                  debug_source_ids |= set(
>                      oe.sbom30.get_element_link_id(d) for d in debug_sources
> @@ -944,7 +970,7 @@ def create_spdx(d):
>
>      if include_sources:
>          bb.debug(1, "Adding sysroot files to SPDX")
> -        sysroot_files = add_package_files(
> +        sysroot_files, _ = add_package_files(
>              d,
>              build_objset,
>              d.expand("${COMPONENTS_DIR}/${PACKAGE_ARCH}/${PN}"),
> @@ -1326,18 +1352,18 @@ def create_image_spdx(d):
>              image_filename = image["filename"]
>              image_path = image_deploy_dir / image_filename
>              if os.path.isdir(image_path):
> -                a = add_package_files(
> -                    d,
> -                    objset,
> -                    image_path,
> -                    lambda file_counter: objset.new_spdxid(
> -                        "imagefile", str(file_counter)
> -                    ),
> -                    lambda filepath: [],
> -                    license_data=None,
> -                    ignore_dirs=[],
> -                    ignore_top_level_dirs=[],
> -                    archive=None,
> +                a, _ = add_package_files(
> +                        d,
> +                        objset,
> +                        image_path,
> +                        lambda file_counter: objset.new_spdxid(
> +                            "imagefile", str(file_counter)
> +                        ),
> +                        lambda filepath: [],
> +                        license_data=None,
> +                        ignore_dirs=[],
> +                        ignore_top_level_dirs=[],
> +                        archive=None,
>                  )
>                  artifacts.extend(a)
>              else:
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [OE-core][PATCH v14 2/4] spdx30: Add supplier support for image and SDK SBOMs
  2026-03-24 13:29   ` [OE-core][PATCH v14 2/4] spdx30: Add supplier support for image and SDK SBOMs stondo
@ 2026-03-24 14:24     ` Joshua Watt
  0 siblings, 0 replies; 32+ messages in thread
From: Joshua Watt @ 2026-03-24 14:24 UTC (permalink / raw)
  To: stondo
  Cc: openembedded-core, richard.purdie, ross.burton, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

On Tue, Mar 24, 2026 at 7:30 AM <stondo@gmail.com> wrote:
>
> From: Stefano Tondo <stefano.tondo.ext@siemens.com>
>
> Add SPDX_IMAGE_SUPPLIER and SPDX_SDK_SUPPLIER variables that allow
> setting a supplier agent on image and SDK SBOM root elements using
> the suppliedBy property.
>
> These follow the existing SPDX_PACKAGE_SUPPLIER pattern and use the
> standard agent variable system to define supplier information.
>
> Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
> Reviewed-by: Joshua Watt <JPEWhacker@gmail.com>

If you push a new patch please don't copy the review tag.

> ---
>  meta/classes/create-spdx-3.0.bbclass | 10 ++++++++++
>  meta/lib/oe/spdx30_tasks.py          | 23 ++++++++++++++++++++---
>  2 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
> index 7515f460c3..9a6606dce6 100644
> --- a/meta/classes/create-spdx-3.0.bbclass
> +++ b/meta/classes/create-spdx-3.0.bbclass
> @@ -124,6 +124,16 @@ SPDX_ON_BEHALF_OF[doc] = "The base variable name to describe the Agent on who's
>  SPDX_PACKAGE_SUPPLIER[doc] = "The base variable name to describe the Agent who \
>      is supplying artifacts produced by the build"
>
> +SPDX_IMAGE_SUPPLIER[doc] = "The base variable name to describe the Agent who \
> +    is supplying the image SBOM. The supplier will be set on all root elements \
> +    of the image SBOM using the suppliedBy property. If not set, no supplier \
> +    information will be added to the image SBOM."
> +
> +SPDX_SDK_SUPPLIER[doc] = "The base variable name to describe the Agent who \
> +    is supplying the SDK SBOM. The supplier will be set on all root elements \
> +    of the SDK SBOM using the suppliedBy property. If not set, no supplier \
> +    information will be added to the SDK SBOM."
> +
>  SPDX_PACKAGE_VERSION ??= "${PV}"
>  SPDX_PACKAGE_VERSION[doc] = "The version of a package, software_packageVersion \
>      in software_Package"
> diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
> index 68ed821a8c..62a00069df 100644
> --- a/meta/lib/oe/spdx30_tasks.py
> +++ b/meta/lib/oe/spdx30_tasks.py
> @@ -1449,6 +1449,16 @@ def create_image_sbom_spdx(d):
>
>      objset, sbom = oe.sbom30.create_sbom(d, image_name, root_elements)
>
> +    # Set supplier on root elements if SPDX_IMAGE_SUPPLIER is defined
> +    supplier = objset.new_agent("SPDX_IMAGE_SUPPLIER", add=False)
> +    if supplier is not None:
> +        supplier_id = supplier if isinstance(supplier, str) else supplier._id
> +        if not isinstance(supplier, str):
> +            objset.add(supplier)
> +        for elem in sbom.rootElement:
> +            if hasattr(elem, "suppliedBy"):
> +                elem.suppliedBy = supplier_id
> +
>      oe.sbom30.write_jsonld_doc(d, objset, spdx_path)
>
>      def make_image_link(target_path, suffix):
> @@ -1560,12 +1570,19 @@ def create_sdk_sbom(d, sdk_deploydir, spdx_work_dir, toolchain_outputname):
>          d, toolchain_outputname, sorted(list(files)), [rootfs_objset]
>      )
>
> +    # Set supplier on root elements if SPDX_SDK_SUPPLIER is defined
> +    supplier = objset.new_agent("SPDX_SDK_SUPPLIER", add=False)
> +    if supplier is not None:
> +        supplier_id = supplier if isinstance(supplier, str) else supplier._id
> +        if not isinstance(supplier, str):
> +            objset.add(supplier)
> +        for elem in sbom.rootElement:
> +            if hasattr(elem, "suppliedBy"):
> +                elem.suppliedBy = supplier_id
> +
>      oe.sbom30.write_jsonld_doc(
>          d, objset, sdk_deploydir / (toolchain_outputname + ".spdx.json")
>      )
> -
> -
> -def create_recipe_sbom(d, deploydir):

This is removed in error. Please add it back

>      sbom_name = d.getVar("SPDX_RECIPE_SBOM_NAME")
>
>      recipe, recipe_objset = load_recipe_spdx(d)
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [OE-core][PATCH v14 3/4] spdx30: Enrich source downloads with version and PURL
  2026-03-24 13:29   ` [OE-core][PATCH v14 3/4] spdx30: Enrich source downloads with version and PURL stondo
@ 2026-03-24 14:46     ` Joshua Watt
  0 siblings, 0 replies; 32+ messages in thread
From: Joshua Watt @ 2026-03-24 14:46 UTC (permalink / raw)
  To: stondo
  Cc: openembedded-core, richard.purdie, ross.burton, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand, Tim Orling

On Tue, Mar 24, 2026 at 7:30 AM <stondo@gmail.com> wrote:
>
> From: Stefano Tondo <stefano.tondo.ext@siemens.com>
>
> Add version extraction, PURL generation, and external references
> to source download packages in SPDX 3.0 SBOMs:
>
> - Extract version from SRCREV for Git sources (full SHA-1)
> - Generate PURLs for Git sources on github.com by default
> - Support custom mappings via SPDX_GIT_PURL_MAPPINGS variable
>   (format: "domain:purl_type", split(':', 1) for parsing)
> - Use ecosystem PURLs from SPDX_PACKAGE_URLS for non-Git
> - Add VCS external references for Git downloads
> - Add distribution external references for tarball downloads
> - Parse Git URLs using urllib.parse
> - Extract logic into _generate_git_purl() and
>   _enrich_source_package() helpers
>
> For non-Git sources, version is not set from PV since the recipe
> version does not necessarily reflect the version of individual
> downloaded files. Ecosystem PURLs (which include version) from
> SPDX_PACKAGE_URLS are still used when available.
>
> The SPDX_GIT_PURL_MAPPINGS variable allows configuring PURL
> generation for self-hosted Git services (e.g., GitLab).
> github.com is always mapped to pkg:github by default.
>
> Add ecosystem-specific SPDX_PACKAGE_URLS to recipe classes:
> - cargo_common.bbclass: pkg:cargo
> - cpan.bbclass: pkg:cpan (with prefix stripping)
> - go-mod.bbclass: pkg:golang
> - npm.bbclass: pkg:npm (with prefix stripping)
> - pypi.bbclass: pkg:pypi (with normalization)

The ecosystem PURL changes and git download PURLs are good, and we
should make them, but see comments below about adding purls to all
download locations.

It might be best to split apart this patch so that adding the
ecosystem PURLs are a separate patch, and also the change to add the
git purls to download items is also a separate patch, and third patch
for adding the PURLs to all non-git downloads (or drop that one if you
are OK and don't want to argue for it).


>
> Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
> ---
>  meta/classes-recipe/cargo_common.bbclass |   3 +
>  meta/classes-recipe/cpan.bbclass         |  11 ++
>  meta/classes-recipe/go-mod.bbclass       |   6 +
>  meta/classes-recipe/npm.bbclass          |   7 +
>  meta/classes-recipe/pypi.bbclass         |   6 +-
>  meta/classes/create-spdx-3.0.bbclass     |   7 +
>  meta/lib/oe/spdx30_tasks.py              | 175 +++++++++++++++++------
>  7 files changed, 172 insertions(+), 43 deletions(-)
>
> diff --git a/meta/classes-recipe/cargo_common.bbclass b/meta/classes-recipe/cargo_common.bbclass
> index bc44ad7918..0d3edfe4a7 100644
> --- a/meta/classes-recipe/cargo_common.bbclass
> +++ b/meta/classes-recipe/cargo_common.bbclass
> @@ -240,3 +240,6 @@ EXPORT_FUNCTIONS do_configure
>  # https://github.com/rust-lang/libc/issues/3223
>  # https://github.com/rust-lang/libc/pull/3175
>  INSANE_SKIP:append = " 32bit-time"
> +
> +# Generate ecosystem-specific Package URL for SPDX
> +SPDX_PACKAGE_URLS =+ "pkg:cargo/${BPN}@${PV} "
> diff --git a/meta/classes-recipe/cpan.bbclass b/meta/classes-recipe/cpan.bbclass
> index bb76a5b326..dbf44da9d2 100644
> --- a/meta/classes-recipe/cpan.bbclass
> +++ b/meta/classes-recipe/cpan.bbclass
> @@ -68,4 +68,15 @@ cpan_do_install () {
>         done
>  }
>
> +# Generate ecosystem-specific Package URL for SPDX
> +def cpan_spdx_name(d):
> +    bpn = d.getVar('BPN')
> +    if bpn.startswith('perl-'):
> +        return bpn[5:]
> +    elif bpn.startswith('libperl-'):
> +        return bpn[8:]
> +    return bpn
> +
> +SPDX_PACKAGE_URLS =+ "pkg:cpan/${@cpan_spdx_name(d)}@${PV} "
> +
>  EXPORT_FUNCTIONS do_configure do_compile do_install
> diff --git a/meta/classes-recipe/go-mod.bbclass b/meta/classes-recipe/go-mod.bbclass
> index a15dda8f0e..5b3cb2d8b9 100644
> --- a/meta/classes-recipe/go-mod.bbclass
> +++ b/meta/classes-recipe/go-mod.bbclass
> @@ -32,3 +32,9 @@ do_compile[dirs] += "${B}/src/${GO_WORKDIR}"
>  # Make go install unpack the module zip files in the module cache directory
>  # before the license directory is polulated with license files.
>  addtask do_compile before do_populate_lic
> +
> +# Generate ecosystem-specific Package URL for SPDX
> +SPDX_PACKAGE_URLS =+ "pkg:golang/${GO_IMPORT}@${PV} "
> +
> +# Generate ecosystem-specific Package URL for SPDX
> +SPDX_PACKAGE_URLS =+ "pkg:golang/${GO_IMPORT}@${PV} "

These lines are duplicated

> diff --git a/meta/classes-recipe/npm.bbclass b/meta/classes-recipe/npm.bbclass
> index 344e8b4bec..7bb791d543 100644
> --- a/meta/classes-recipe/npm.bbclass
> +++ b/meta/classes-recipe/npm.bbclass
> @@ -354,4 +354,11 @@ FILES:${PN} += " \
>      ${nonarch_libdir} \
>  "
>
> +# Generate ecosystem-specific Package URL for SPDX
> +def npm_spdx_name(d):
> +    bpn = d.getVar('BPN')
> +    return bpn[5:] if bpn.startswith('node-') else bpn
> +
> +SPDX_PACKAGE_URLS =+ "pkg:npm/${@npm_spdx_name(d)}@${PV} "
> +
>  EXPORT_FUNCTIONS do_configure do_compile do_install
> diff --git a/meta/classes-recipe/pypi.bbclass b/meta/classes-recipe/pypi.bbclass
> index 9d46c035f6..e2d054af6d 100644
> --- a/meta/classes-recipe/pypi.bbclass
> +++ b/meta/classes-recipe/pypi.bbclass
> @@ -43,7 +43,8 @@ SECTION = "devel/python"
>  SRC_URI:prepend = "${PYPI_SRC_URI} "
>  S = "${UNPACKDIR}/${PYPI_PACKAGE}-${PV}"
>
> -UPSTREAM_CHECK_PYPI_PACKAGE ?= "${PYPI_PACKAGE}"
> +# Replace any '_' characters in the pypi URI with '-'s to follow the PyPi website naming conventions
> +UPSTREAM_CHECK_PYPI_PACKAGE ?= "${@pypi_normalize(d)}"

I don't think we want to change this line? Or if we do it needs to be
a separate patch with rationalization.

>
>  # Use the simple repository API rather than the potentially unstable project URL
>  # More information on the pypi API specification is avaialble here:
> @@ -54,3 +55,6 @@ UPSTREAM_CHECK_URI ?= "https://pypi.org/simple/${@pypi_normalize(d)}/"
>  UPSTREAM_CHECK_REGEX ?= "${UPSTREAM_CHECK_PYPI_PACKAGE}-(?P<pver>(\d+[\.\-_]*)+).(tar\.gz|tgz|zip|tar\.bz2)"
>
>  CVE_PRODUCT ?= "python:${PYPI_PACKAGE}"
> +
> +# Generate ecosystem-specific Package URL for SPDX
> +SPDX_PACKAGE_URLS =+ "pkg:pypi/${@pypi_normalize(d)}@${PV} "

Hmm, this is supposed to be the actual name on PyPi, which is the same
as the UPSTREAM_CHECK_PYPI_PACKAGE definition... which is tricky. It
would be nice if users could set one variable that was the name as
known to PyPi to populate both UPSTREAM_CHECK_PYPI_PACKAGE and
SPDX_PACKAGE_URLS

CC Tim for his thoughts.



> diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
> index 9a6606dce6..265dc525bc 100644
> --- a/meta/classes/create-spdx-3.0.bbclass
> +++ b/meta/classes/create-spdx-3.0.bbclass
> @@ -156,6 +156,13 @@ SPDX_RECIPE_SBOM_NAME ?= "${PN}-recipe-sbom"
>  SPDX_RECIPE_SBOM_NAME[doc] = "The name of output recipe SBoM when using \
>      create_recipe_sbom"
>
> +SPDX_GIT_PURL_MAPPINGS ??= ""
> +SPDX_GIT_PURL_MAPPINGS[doc] = "A space separated list of domain:purl_type \
> +    mappings to configure PURL generation for Git source downloads. \
> +    For example, "gitlab.example.com:pkg:gitlab" maps repositories hosted \
> +    on gitlab.example.com to the pkg:gitlab PURL type. \
> +    github.com is always mapped to pkg:github by default."
> +
>  IMAGE_CLASSES:append = " create-spdx-image-3.0"
>  SDK_CLASSES += "create-spdx-sdk-3.0"
>
> diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
> index 62a00069df..6f0bdba975 100644
> --- a/meta/lib/oe/spdx30_tasks.py
> +++ b/meta/lib/oe/spdx30_tasks.py
> @@ -14,6 +14,7 @@ import oe.spdx_common
>  import oe.sdk
>  import os
>  import re
> +import urllib.parse
>
>  from contextlib import contextmanager
>  from datetime import datetime, timezone
> @@ -384,6 +385,120 @@ def collect_dep_sources(dep_objsets, dest):
>              index_sources_by_hash(e.to, dest)
>
>
> +def _generate_git_purl(d, download_location, srcrev):
> +    """Generate a Package URL for a Git source from its download location.
> +
> +    Parses the Git URL to identify the hosting service and generates the
> +    appropriate PURL type. Supports github.com by default and custom
> +    mappings via SPDX_GIT_PURL_MAPPINGS.
> +
> +    Returns the PURL string or None if no mapping matches.
> +    """
> +    if not download_location or not download_location.startswith('git+'):
> +        return None
> +
> +    git_url = download_location[4:]  # Remove 'git+' prefix
> +
> +    # Default handler: github.com
> +    git_purl_handlers = {
> +        'github.com': 'pkg:github',
> +    }
> +
> +    # Custom PURL mappings from SPDX_GIT_PURL_MAPPINGS
> +    # Format: "domain1:purl_type1 domain2:purl_type2"
> +    custom_mappings = d.getVar('SPDX_GIT_PURL_MAPPINGS')
> +    if custom_mappings:
> +        for mapping in custom_mappings.split():
> +            parts = mapping.split(':', 1)
> +            if len(parts) == 2:
> +                git_purl_handlers[parts[0]] = parts[1]
> +                bb.debug(2, f"Added custom Git PURL mapping: {parts[0]} -> {parts[1]}")
> +            else:
> +                bb.warn(f"Invalid SPDX_GIT_PURL_MAPPINGS entry: {mapping} (expected format: domain:purl_type)")
> +
> +    try:
> +        parsed = urllib.parse.urlparse(git_url)
> +    except Exception:
> +        return None
> +
> +    hostname = parsed.hostname
> +    if not hostname:
> +        return None
> +
> +    for domain, purl_type in git_purl_handlers.items():
> +        if hostname == domain:
> +            path = parsed.path.strip('/')
> +            path_parts = path.split('/')
> +            if len(path_parts) >= 2:
> +                owner = path_parts[0]
> +                repo = path_parts[1].replace('.git', '')
> +                return f"{purl_type}/{owner}/{repo}@{srcrev}"
> +            break
> +
> +    return None
> +
> +
> +def _enrich_source_package(d, dl, fd, file_name, primary_purpose):
> +    """Enrich a source download package with version, PURL, and external refs.
> +
> +    Extracts version from SRCREV for Git sources, generates PURLs for
> +    known hosting services, and adds external references for VCS,
> +    distribution URLs, and homepage.
> +    """
> +    version = None
> +    purl = None
> +
> +    if fd.type == "git":
> +        # Use full SHA-1 from fd.revision
> +        srcrev = getattr(fd, 'revision', None)
> +        if srcrev and srcrev not in {'${AUTOREV}', 'AUTOINC', 'INVALID'}:
> +            version = srcrev
> +
> +        # Generate PURL for Git hosting services
> +        download_location = getattr(dl, 'software_downloadLocation', None)
> +        if version and download_location:
> +            purl = _generate_git_purl(d, download_location, version)
> +    else:
> +        # Use ecosystem PURL from SPDX_PACKAGE_URLS if available
> +        package_urls = (d.getVar('SPDX_PACKAGE_URLS') or '').split()
> +        for url in package_urls:
> +            if not url.startswith('pkg:yocto'):
> +                purl = url
> +                break

The git purls (before the else:) are fine to keep; they are definitely
correct. However, I don't think that we want to add the _recipe_ purl
to all download files at this time. It's very easy for that to be
incorrect or misleading (e.g. local files). We already have the PURL
on the recipe, so copying those same purls to the download files seems
of little value.

There might perhaps be some space to determine the "primary source" of
the recipe (e.g. when it's a tarball et. al.) and add a PURL to that,
or some mechanism to allow recipe writers set the PURL for a specific
SRC_URI, but as-written I don't think we can do this.

> +
> +    if version:
> +        dl.software_packageVersion = version
> +
> +    if purl:
> +        dl.software_packageUrl = purl
> +
> +    # Add external references
> +    download_location = getattr(dl, 'software_downloadLocation', None)
> +    if download_location and isinstance(download_location, str):
> +        dl.externalRef = dl.externalRef or []
> +
> +        if download_location.startswith('git+'):
> +            # VCS reference for Git repositories
> +            git_url = download_location[4:]
> +            if '@' in git_url:
> +                git_url = git_url.split('@')[0]
> +
> +            dl.externalRef.append(
> +                oe.spdx30.ExternalRef(
> +                    externalRefType=oe.spdx30.ExternalRefType.vcs,
> +                    locator=[git_url],
> +                )
> +            )
> +        elif download_location.startswith(('http://', 'https://', 'ftp://')):
> +            # Distribution reference for tarball/archive downloads
> +            dl.externalRef.append(
> +                oe.spdx30.ExternalRef(
> +                    externalRefType=oe.spdx30.ExternalRefType.altDownloadLocation,
> +                    locator=[download_location],
> +                )
> +            )
> +
> +
>  def add_download_files(d, objset):
>      inputs = set()
>
> @@ -447,10 +562,14 @@ def add_download_files(d, objset):
>                  )
>              )
>
> +            _enrich_source_package(d, dl, fd, file_name, primary_purpose)
> +
>              if fd.method.supports_checksum(fd):
>                  # TODO Need something better than hard coding this
>                  for checksum_id in ["sha256", "sha1"]:
> -                    expected_checksum = getattr(fd, "%s_expected" % checksum_id, None)
> +                    expected_checksum = getattr(
> +                        fd, "%s_expected" % checksum_id, None
> +                    )
>                      if expected_checksum is None:
>                          continue
>
> @@ -506,7 +625,6 @@ def get_is_native(d):
>
>  def create_recipe_spdx(d):
>      deploydir = Path(d.getVar("SPDXRECIPEDEPLOY"))
> -    deploy_dir_spdx = Path(d.getVar("DEPLOY_DIR_SPDX"))
>      pn = d.getVar("PN")
>
>      license_data = oe.spdx_common.load_spdx_license_data(d)
> @@ -541,20 +659,6 @@ def create_recipe_spdx(d):
>
>      set_purls(recipe, (d.getVar("SPDX_PACKAGE_URLS") or "").split())
>
> -    # TODO: This doesn't work before do_unpack because the license text has to
> -    # be available for recipes with NO_GENERIC_LICENSE
> -    # recipe_spdx_license = add_license_expression(
> -    #    d,
> -    #    recipe_objset,
> -    #    d.getVar("LICENSE"),
> -    #    license_data,
> -    # )
> -    # recipe_objset.new_relationship(
> -    #    [recipe],
> -    #    oe.spdx30.RelationshipType.hasDeclaredLicense,
> -    #    [oe.sbom30.get_element_link_id(recipe_spdx_license)],
> -    # )
> -

Please don't remove my comments

>      if val := d.getVar("HOMEPAGE"):
>          recipe.software_homePage = val
>
> @@ -588,7 +692,6 @@ def create_recipe_spdx(d):
>              sorted(oe.sbom30.get_element_link_id(dep) for dep in dep_recipes),
>          )
>
> -    # Add CVEs
>      cve_by_status = {}
>      if include_vex != "none":
>          patched_cves = oe.cve_check.get_patched_cves(d)
> @@ -598,8 +701,6 @@ def create_recipe_spdx(d):
>              description = patched_cve.get("justification", None)
>              resources = patched_cve.get("resource", [])
>
> -            # If this CVE is fixed upstream, skip it unless all CVEs are
> -            # specified.
>              if include_vex != "all" and detail in (
>                  "fixed-version",
>                  "cpe-stable-backport",
> @@ -692,7 +793,6 @@ def create_recipe_spdx(d):
>
>
>  def load_recipe_spdx(d):
> -
>      return oe.sbom30.find_root_obj_in_jsonld(
>          d,
>          "static",
> @@ -717,10 +817,8 @@ def create_spdx(d):
>
>      pn = d.getVar("PN")
>      deploydir = Path(d.getVar("SPDXDEPLOY"))
> -    deploy_dir_spdx = Path(d.getVar("DEPLOY_DIR_SPDX"))

This is removed because it's unused? In the future, please use
separate patches for that sort of thing

>      spdx_workdir = Path(d.getVar("SPDXWORK"))
>      include_sources = d.getVar("SPDX_INCLUDE_SOURCES") == "1"
> -    pkg_arch = d.getVar("SSTATE_PKGARCH")
>      is_native = get_is_native(d)
>
>      recipe, recipe_objset = load_recipe_spdx(d)
> @@ -783,7 +881,6 @@ def create_spdx(d):
>      dep_objsets, dep_builds = collect_dep_objsets(
>          d, direct_deps, "builds", "build-", oe.spdx30.build_Build
>      )
> -

I don't particularly mind formatting changes, as long as it makes them
PEP8 compliant, but I do find vertical space helps read the code so
please don't remove it. It also just adds unnnecessary (off-topic)
changes to the code review that we have to look at that.

>      if dep_builds:
>          build_objset.new_scoped_relationship(
>              [build],
> @@ -919,9 +1016,7 @@ def create_spdx(d):
>
>              # Add concluded license relationship if manually set
>              # Only add when license analysis has been explicitly performed
> -            concluded_license_str = d.getVar(
> -                "SPDX_CONCLUDED_LICENSE:%s" % package
> -            ) or d.getVar("SPDX_CONCLUDED_LICENSE")
> +            concluded_license_str = d.getVar("SPDX_CONCLUDED_LICENSE:%s" % package) or d.getVar("SPDX_CONCLUDED_LICENSE")
>              if concluded_license_str:
>                  concluded_spdx_license = add_license_expression(
>                      d, build_objset, concluded_license_str, license_data
> @@ -1011,13 +1106,12 @@ def create_spdx(d):
>                  status = "enabled" if feature in enabled else "disabled"
>                  build.build_parameter.append(
>                      oe.spdx30.DictionaryEntry(
> -                        key=f"PACKAGECONFIG:{feature}", value=status
> +                        key=f"PACKAGECONFIG:{feature}",
> +                        value=status
>                      )
>                  )
>
> -            bb.note(
> -                f"Added PACKAGECONFIG entries: {len(enabled)} enabled, {len(disabled)} disabled"
> -            )
> +            bb.note(f"Added PACKAGECONFIG entries: {len(enabled)} enabled, {len(disabled)} disabled")
>
>      oe.sbom30.write_recipe_jsonld_doc(d, build_objset, "builds", deploydir)
>
> @@ -1025,9 +1119,7 @@ def create_spdx(d):
>  def create_package_spdx(d):
>      deploy_dir_spdx = Path(d.getVar("DEPLOY_DIR_SPDX"))
>      deploydir = Path(d.getVar("SPDXRUNTIMEDEPLOY"))
> -
>      direct_deps = oe.spdx_common.collect_direct_deps(d, "do_create_spdx")
> -
>      providers = oe.spdx_common.collect_package_providers(d, direct_deps)
>      pkg_arch = d.getVar("SSTATE_PKGARCH")
>
> @@ -1205,15 +1297,15 @@ def write_bitbake_spdx(d):
>  def collect_build_package_inputs(d, objset, build, packages, files_by_hash=None):
>      import oe.sbom30
>
> -    direct_deps = oe.spdx_common.collect_direct_deps(d, "do_create_spdx")
> -
> +    direct_deps = oe.spdx_common.collect_direct_deps(d, "do_create_package_spdx")

Why did you change this?

>      providers = oe.spdx_common.collect_package_providers(d, direct_deps)
>
>      build_deps = set()
> +    missing_providers = set()
>
>      for name in sorted(packages.keys()):
>          if name not in providers:
> -            bb.note(f"Unable to find SPDX provider for '{name}'")
> +            missing_providers.add(name)
>              continue
>
>          pkg_name, pkg_hashfn = providers[name]
> @@ -1232,6 +1324,11 @@ def collect_build_package_inputs(d, objset, build, packages, files_by_hash=None)
>              for h, f in pkg_objset.by_sha256_hash.items():
>                  files_by_hash.setdefault(h, set()).update(f)
>
> +    if missing_providers:
> +        bb.fatal(
> +            f"Unable to find SPDX provider(s) for: {', '.join(sorted(missing_providers))}"
> +        )
> +

This is a good change, but off-topic, and should be its own patch

>      if build_deps:
>          objset.new_scoped_relationship(
>              [build],
> @@ -1390,6 +1487,7 @@ def create_image_spdx(d):
>
>                  set_timestamp_now(d, a, "builtTime")
>
> +
>          if artifacts:
>              objset.new_scoped_relationship(
>                  [image_build],
> @@ -1583,10 +1681,3 @@ def create_sdk_sbom(d, sdk_deploydir, spdx_work_dir, toolchain_outputname):
>      oe.sbom30.write_jsonld_doc(
>          d, objset, sdk_deploydir / (toolchain_outputname + ".spdx.json")
>      )
> -    sbom_name = d.getVar("SPDX_RECIPE_SBOM_NAME")
> -
> -    recipe, recipe_objset = load_recipe_spdx(d)
> -
> -    objset, sbom = oe.sbom30.create_sbom(d, sbom_name, [recipe], [recipe_objset])
> -
> -    oe.sbom30.write_jsonld_doc(d, objset, deploydir / (sbom_name + ".spdx.json"))

Why is this removed?

> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v13 4/4] oeqa/selftest: Add tests for source download enrichment
  2026-03-23 21:07 ` [PATCH v13 4/4] oeqa/selftest: Add tests for source download enrichment Stefano Tondo
  2026-03-24 10:26   ` Richard Purdie
@ 2026-03-24 14:48   ` Joshua Watt
  1 sibling, 0 replies; 32+ messages in thread
From: Joshua Watt @ 2026-03-24 14:48 UTC (permalink / raw)
  To: Stefano Tondo
  Cc: openembedded-core, richard.purdie, ross.burton, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

On Mon, Mar 23, 2026 at 3:07 PM Stefano Tondo <stondo@gmail.com> wrote:
>
> Add comprehensive tests for the new source download SPDX features:
>
> test_download_location_defensive_handling:
>   Verify that packages with no download location (e.g. packagegroups,
>   images, virtual providers) are handled gracefully without crashing
>   the SPDX generation pipeline.
>
> test_version_extraction_patterns:
>   Verify that Git source packages get SRCREV as their version in the
>   SPDX output, rather than the recipe PV.
>
> test_packageconfig_spdx:
>   Verify that PACKAGECONFIG features are correctly recorded in SPDX
>   build parameters when SPDX_INCLUDE_PACKAGECONFIG is enabled.

The tests look good, but the merge went wrong

>
> Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
> ---
>  meta/lib/oeqa/selftest/cases/spdx.py | 104 +++++++++++++++++++++------
>  1 file changed, 83 insertions(+), 21 deletions(-)
>
> diff --git a/meta/lib/oeqa/selftest/cases/spdx.py b/meta/lib/oeqa/selftest/cases/spdx.py
> index af1144c1e5..140d3debba 100644
> --- a/meta/lib/oeqa/selftest/cases/spdx.py
> +++ b/meta/lib/oeqa/selftest/cases/spdx.py
> @@ -141,29 +141,15 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
>      SPDX_CLASS = "create-spdx-3.0"
>
>      def test_base_files(self):
> -        self.check_recipe_spdx(
> -            "base-files",
> -            "{DEPLOY_DIR_SPDX}/{MACHINE_ARCH}/static/static-base-files.spdx.json",
> -            task="create_recipe_spdx",
> -        )

I think your merge went wrong here, since you removed my changes
instead of keeping them :)

>          self.check_recipe_spdx(
>              "base-files",
>              "{DEPLOY_DIR_SPDX}/{MACHINE_ARCH}/packages/package-base-files.spdx.json",
>          )
>
> -    def test_world_sbom(self):
> -        objset = self.check_recipe_spdx(
> -            "meta-world-recipe-sbom",
> -            "{DEPLOY_DIR_IMAGE}/world-recipe-sbom.spdx.json",
> -        )
> -
> -        # Document should be fully linked
> -        self.check_objset_missing_ids(objset)
> -

ditto.

>      def test_gcc_include_source(self):
>          objset = self.check_recipe_spdx(
>              "gcc",
> -            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-gcc.spdx.json",
> +            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/recipes/recipe-gcc.spdx.json",
>              extraconf="""\
>                  SPDX_INCLUDE_SOURCES = "1"
>                  """,
> @@ -176,12 +162,12 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
>              if software_file.name == filename:
>                  found = True
>                  self.logger.info(
> -                    f"The spdxId of {filename} in build-gcc.spdx.json is {software_file.spdxId}"
> +                    f"The spdxId of {filename} in recipe-gcc.spdx.json is {software_file.spdxId}"
>                  )
>                  break
>
>          self.assertTrue(
> -            found, f"Not found source file {filename} in build-gcc.spdx.json\n"
> +            found, f"Not found source file {filename} in recipe-gcc.spdx.json\n"
>          )
>
>      def test_core_image_minimal(self):
> @@ -319,7 +305,7 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
>          # This will fail with NameError if new_annotation() is called incorrectly
>          objset = self.check_recipe_spdx(
>              "base-files",
> -            "{DEPLOY_DIR_SPDX}/{MACHINE_ARCH}/builds/build-base-files.spdx.json",
> +            "{DEPLOY_DIR_SPDX}/{MACHINE_ARCH}/recipes/recipe-base-files.spdx.json",
>              extraconf=textwrap.dedent(
>                  f"""\
>                  ANNOTATION1 = "{ANNOTATION_VAR1}"
> @@ -374,8 +360,8 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
>
>      def test_kernel_config_spdx(self):
>          kernel_recipe = get_bb_var("PREFERRED_PROVIDER_virtual/kernel")
> -        spdx_file = f"build-{kernel_recipe}.spdx.json"
> -        spdx_path = f"{{DEPLOY_DIR_SPDX}}/{{SSTATE_PKGARCH}}/builds/{spdx_file}"
> +        spdx_file = f"recipe-{kernel_recipe}.spdx.json"
> +        spdx_path = f"{{DEPLOY_DIR_SPDX}}/{{SSTATE_PKGARCH}}/recipes/{spdx_file}"
>
>          # Make sure kernel is configured first
>          bitbake(f"-c configure {kernel_recipe}")
> @@ -383,7 +369,7 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
>          objset = self.check_recipe_spdx(
>              kernel_recipe,
>              spdx_path,
> -            task="do_create_spdx",
> +            task="do_create_kernel_config_spdx",
>              extraconf="""\
>                  INHERIT += "create-spdx"
>                  SPDX_INCLUDE_KERNEL_CONFIG = "1"
> @@ -428,3 +414,79 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
>                  value, ["enabled", "disabled"],
>                  f"Unexpected PACKAGECONFIG value '{value}' for {key}"
>              )
> +
> +    def test_download_location_defensive_handling(self):
> +        """Test that download_location handling is defensive.
> +
> +        Verifies SPDX generation succeeds and external references are
> +        properly structured when download_location retrieval works.
> +        """
> +        objset = self.check_recipe_spdx(
> +            "m4",
> +            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-m4.spdx.json",
> +        )
> +
> +        found_external_refs = False
> +        for pkg in objset.foreach_type(oe.spdx30.software_Package):
> +            if pkg.externalRef:
> +                found_external_refs = True
> +                for ref in pkg.externalRef:
> +                    self.assertIsNotNone(ref.externalRefType)
> +                    self.assertIsNotNone(ref.locator)
> +                    self.assertGreater(len(ref.locator), 0, "Locator should have at least one entry")
> +                    for loc in ref.locator:
> +                        self.assertIsInstance(loc, str)
> +                break
> +
> +        self.logger.info(
> +            f"External references {'found' if found_external_refs else 'not found'} "
> +            f"in SPDX output (defensive handling verified)"
> +        )
> +
> +    def test_version_extraction_patterns(self):
> +        """Test that version extraction works for various package formats.
> +
> +        Verifies that Git source downloads carry extracted versions and that
> +        the reported version strings are well-formed.
> +        """
> +        objset = self.check_recipe_spdx(
> +            "opkg-utils",
> +            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-opkg-utils.spdx.json",
> +        )
> +
> +        # Collect all packages with versions
> +        packages_with_versions = []
> +        for pkg in objset.foreach_type(oe.spdx30.software_Package):
> +            if pkg.software_packageVersion:
> +                packages_with_versions.append((pkg.name, pkg.software_packageVersion))
> +
> +        self.assertGreater(
> +            len(packages_with_versions), 0,
> +            "Should find packages with extracted versions"
> +        )
> +
> +        for name, version in packages_with_versions:
> +            self.assertRegex(
> +                version,
> +                r"^[0-9a-f]{40}$",
> +                f"Expected Git source version for {name} to be a full SHA-1",
> +            )
> +
> +        self.logger.info(f"Found {len(packages_with_versions)} packages with versions")
> +
> +        # Log some examples for debugging
> +        for name, version in packages_with_versions[:5]:
> +            self.logger.info(f"  {name}: {version}")
> +
> +        # Verify that versions follow expected patterns
> +        for name, version in packages_with_versions:
> +            # Version should not be empty
> +            self.assertIsNotNone(version)
> +            self.assertNotEqual(version, "")
> +
> +            # Version should contain digits
> +            self.assertRegex(
> +                version,
> +                r'\d',
> +                f"Version '{version}' for package '{name}' should contain digits"
> +            )
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v16 0/5] spdx30: PURL and source download enrichment
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (3 preceding siblings ...)
  2026-03-24 13:29   ` [OE-core][PATCH v14 4/4] oeqa/selftest: Add tests for source download enrichment stondo
@ 2026-03-24 17:12   ` Stefano Tondo
  2026-03-24 17:12   ` [PATCH v16 1/5] spdx30: Add configurable file exclusion pattern support Stefano Tondo
                     ` (10 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Stefano Tondo @ 2026-03-24 17:12 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

This series enhances Yocto's SPDX 3.0 output with ecosystem-specific
Package URLs, Git source version tracking, and configurable SBOM
generation improvements.

Changes since v14 (addressing Joshua's review of v14 patches 3/4 and 4/4):
- Split the monolithic "Enrich source downloads" patch into two focused
  patches: ecosystem PURLs (patch 3, bbclass-only) and Git download
  enrichment (patch 4, spdx30_tasks.py additions-only)
- Removed ALL extraneous changes: formatting, comment removals,
  blank-line deletions, variable removals, and off-topic refactors
- Fixed go-mod.bbclass duplicate SPDX_PACKAGE_URLS line
- Reverted pypi.bbclass UPSTREAM_CHECK_PYPI_PACKAGE change
- Removed the else-branch that copied recipe ecosystem PURLs to
  non-git download files (per Joshua's feedback)
- Reverted do_create_spdx -> do_create_package_spdx change in
  collect_build_package_inputs
- Reverted bb.note -> bb.fatal for missing SPDX providers
- Restored all removed TODO comments and blank lines
- Patches 2-5 are now strictly additions-only (0 deletions)
- Tests unchanged (additions-only, all 12 master tests preserved)

Stefano Tondo (5):
  spdx30: Add configurable file exclusion pattern support
  spdx30: Add supplier support for image and SDK SBOMs
  spdx30: Add ecosystem PURLs for recipe classes
  spdx30: Add Git version and PURL to source downloads
  oeqa/selftest: Add tests for source download enrichment

 meta/classes-recipe/cargo_common.bbclass |   3 +
 meta/classes-recipe/cpan.bbclass         |  11 ++
 meta/classes-recipe/go-mod.bbclass       |   3 +
 meta/classes-recipe/npm.bbclass          |   7 +
 meta/classes-recipe/pypi.bbclass         |   3 +
 meta/classes/create-spdx-3.0.bbclass     |  17 ++
 meta/classes/spdx-common.bbclass         |   7 +
 meta/lib/oe/spdx30_tasks.py              | 202 ++++++++++++++++++++---
 meta/lib/oeqa/selftest/cases/spdx.py     |  76 +++++++++
 9 files changed, 302 insertions(+), 27 deletions(-)

-- 
2.53.0



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v16 1/5] spdx30: Add configurable file exclusion pattern support
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (4 preceding siblings ...)
  2026-03-24 17:12   ` [PATCH v16 0/5] spdx30: PURL and " Stefano Tondo
@ 2026-03-24 17:12   ` Stefano Tondo
  2026-03-24 17:12   ` [PATCH v16 2/5] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Stefano Tondo @ 2026-03-24 17:12 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Add SPDX_FILE_EXCLUDE_PATTERNS variable that allows filtering files from
SPDX output by regex matching. The variable accepts a space-separated
list of Python regular expressions; files whose paths match any pattern
(via re.search) are excluded.

When empty (the default), no filtering is applied and all files are
included, preserving existing behavior.

This enables users to reduce SBOM size by excluding files that are not
relevant for compliance (e.g., test files, object files, patches).

Excluded files are tracked in a set returned from add_package_files()
and passed to get_package_sources_from_debug(), which uses the set for
precise cross-checking rather than re-evaluating patterns.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes/spdx-common.bbclass |  7 +++
 meta/lib/oe/spdx30_tasks.py      | 80 +++++++++++++++++++++-----------
 2 files changed, 60 insertions(+), 27 deletions(-)

diff --git a/meta/classes/spdx-common.bbclass b/meta/classes/spdx-common.bbclass
index 83f05579b6..40701730a6 100644
--- a/meta/classes/spdx-common.bbclass
+++ b/meta/classes/spdx-common.bbclass
@@ -82,6 +82,13 @@ SPDX_MULTILIB_SSTATE_ARCHS[doc] = "The list of sstate architectures to consider
     when collecting SPDX dependencies. This includes multilib architectures when \
     multilib is enabled. Defaults to SSTATE_ARCHS."
 
+SPDX_FILE_EXCLUDE_PATTERNS ??= ""
+SPDX_FILE_EXCLUDE_PATTERNS[doc] = "Space-separated list of Python regular \
+    expressions to exclude files from SPDX output. Files whose paths match \
+    any pattern (via re.search) will be filtered out. Defaults to empty \
+    (no filtering). Example: \
+    SPDX_FILE_EXCLUDE_PATTERNS = '\\.patch$ \\.diff$ /test/ \\.pyc$ \\.o$'"
+
 python () {
     from oe.cve_check import extend_cve_status
     extend_cve_status(d)
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 353d783fa2..68ed821a8c 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -13,6 +13,7 @@ import oe.spdx30
 import oe.spdx_common
 import oe.sdk
 import os
+import re
 
 from contextlib import contextmanager
 from datetime import datetime, timezone
@@ -157,17 +158,27 @@ def add_package_files(
     file_counter = 1
     if not os.path.exists(topdir):
         bb.note(f"Skip {topdir}")
-        return spdx_files
+        return spdx_files, set()
 
     check_compiled_sources = d.getVar("SPDX_INCLUDE_COMPILED_SOURCES") == "1"
     if check_compiled_sources:
         compiled_sources, types = oe.spdx_common.get_compiled_sources(d)
         bb.debug(1, f"Total compiled files: {len(compiled_sources)}")
 
+    exclude_patterns = [
+        re.compile(pattern)
+        for pattern in (d.getVar("SPDX_FILE_EXCLUDE_PATTERNS") or "").split()
+    ]
+    excluded_files = set()
+
     for subdir, dirs, files in os.walk(topdir, onerror=walk_error):
-        dirs[:] = [d for d in dirs if d not in ignore_dirs]
+        dirs[:] = [directory for directory in dirs if directory not in ignore_dirs]
         if subdir == str(topdir):
-            dirs[:] = [d for d in dirs if d not in ignore_top_level_dirs]
+            dirs[:] = [
+                directory
+                for directory in dirs
+                if directory not in ignore_top_level_dirs
+            ]
 
         dirs.sort()
         files.sort()
@@ -177,14 +188,19 @@ def add_package_files(
                 continue
 
             filename = str(filepath.relative_to(topdir))
+
+            if exclude_patterns and any(
+                pattern.search(filename) for pattern in exclude_patterns
+            ):
+                excluded_files.add(filename)
+                continue
+
             file_purposes = get_purposes(filepath)
 
-            # Check if file is compiled
-            if check_compiled_sources:
-                if not oe.spdx_common.is_compiled_source(
-                    filename, compiled_sources, types
-                ):
-                    continue
+            if check_compiled_sources and not oe.spdx_common.is_compiled_source(
+                filename, compiled_sources, types
+            ):
+                continue
 
             spdx_file = objset.new_file(
                 get_spdxid(file_counter),
@@ -218,12 +234,15 @@ def add_package_files(
 
     bb.debug(1, "Added %d files to %s" % (len(spdx_files), objset.doc._id))
 
-    return spdx_files
+    return spdx_files, excluded_files
 
 
 def get_package_sources_from_debug(
-    d, package, package_files, sources, source_hash_cache
+    d, package, package_files, sources, source_hash_cache, excluded_files=None
 ):
+    if excluded_files is None:
+        excluded_files = set()
+
     def file_path_match(file_path, pkg_file):
         if file_path.lstrip("/") == pkg_file.name.lstrip("/"):
             return True
@@ -256,6 +275,12 @@ def get_package_sources_from_debug(
             continue
 
         if not any(file_path_match(file_path, pkg_file) for pkg_file in package_files):
+            if file_path.lstrip("/") in excluded_files:
+                bb.debug(
+                    1,
+                    f"Skipping debug source lookup for excluded file {file_path} in {package}",
+                )
+                continue
             bb.fatal(
                 "No package file found for %s in %s; SPDX found: %s"
                 % (str(file_path), package, " ".join(p.name for p in package_files))
@@ -737,7 +762,7 @@ def create_spdx(d):
         bb.debug(1, "Adding source files to SPDX")
         oe.spdx_common.get_patched_src(d)
 
-        files = add_package_files(
+        files, _ = add_package_files(
             d,
             build_objset,
             spdx_workdir,
@@ -909,7 +934,7 @@ def create_spdx(d):
                 )
 
             bb.debug(1, "Adding package files to SPDX for package %s" % pkg_name)
-            package_files = add_package_files(
+            package_files, excluded_files = add_package_files(
                 d,
                 pkg_objset,
                 pkgdest / package,
@@ -932,7 +957,8 @@ def create_spdx(d):
 
             if include_sources:
                 debug_sources = get_package_sources_from_debug(
-                    d, package, package_files, dep_sources, source_hash_cache
+                    d, package, package_files, dep_sources, source_hash_cache,
+                    excluded_files=excluded_files,
                 )
                 debug_source_ids |= set(
                     oe.sbom30.get_element_link_id(d) for d in debug_sources
@@ -944,7 +970,7 @@ def create_spdx(d):
 
     if include_sources:
         bb.debug(1, "Adding sysroot files to SPDX")
-        sysroot_files = add_package_files(
+        sysroot_files, _ = add_package_files(
             d,
             build_objset,
             d.expand("${COMPONENTS_DIR}/${PACKAGE_ARCH}/${PN}"),
@@ -1326,18 +1352,18 @@ def create_image_spdx(d):
             image_filename = image["filename"]
             image_path = image_deploy_dir / image_filename
             if os.path.isdir(image_path):
-                a = add_package_files(
-                    d,
-                    objset,
-                    image_path,
-                    lambda file_counter: objset.new_spdxid(
-                        "imagefile", str(file_counter)
-                    ),
-                    lambda filepath: [],
-                    license_data=None,
-                    ignore_dirs=[],
-                    ignore_top_level_dirs=[],
-                    archive=None,
+                a, _ = add_package_files(
+                        d,
+                        objset,
+                        image_path,
+                        lambda file_counter: objset.new_spdxid(
+                            "imagefile", str(file_counter)
+                        ),
+                        lambda filepath: [],
+                        license_data=None,
+                        ignore_dirs=[],
+                        ignore_top_level_dirs=[],
+                        archive=None,
                 )
                 artifacts.extend(a)
             else:
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v16 2/5] spdx30: Add supplier support for image and SDK SBOMs
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (5 preceding siblings ...)
  2026-03-24 17:12   ` [PATCH v16 1/5] spdx30: Add configurable file exclusion pattern support Stefano Tondo
@ 2026-03-24 17:12   ` Stefano Tondo
  2026-03-24 17:12   ` [PATCH v16 3/5] spdx30: Add ecosystem PURLs for recipe classes Stefano Tondo
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Stefano Tondo @ 2026-03-24 17:12 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Add SPDX_IMAGE_SUPPLIER and SPDX_SDK_SUPPLIER variables that allow
setting a supplier agent on image and SDK SBOM root elements using
the suppliedBy property.

These follow the existing SPDX_PACKAGE_SUPPLIER pattern and use the
standard agent variable system to define supplier information.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes/create-spdx-3.0.bbclass | 10 ++++++++++
 meta/lib/oe/spdx30_tasks.py          | 20 ++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
index 7515f460c3..9a6606dce6 100644
--- a/meta/classes/create-spdx-3.0.bbclass
+++ b/meta/classes/create-spdx-3.0.bbclass
@@ -124,6 +124,16 @@ SPDX_ON_BEHALF_OF[doc] = "The base variable name to describe the Agent on who's
 SPDX_PACKAGE_SUPPLIER[doc] = "The base variable name to describe the Agent who \
     is supplying artifacts produced by the build"
 
+SPDX_IMAGE_SUPPLIER[doc] = "The base variable name to describe the Agent who \
+    is supplying the image SBOM. The supplier will be set on all root elements \
+    of the image SBOM using the suppliedBy property. If not set, no supplier \
+    information will be added to the image SBOM."
+
+SPDX_SDK_SUPPLIER[doc] = "The base variable name to describe the Agent who \
+    is supplying the SDK SBOM. The supplier will be set on all root elements \
+    of the SDK SBOM using the suppliedBy property. If not set, no supplier \
+    information will be added to the SDK SBOM."
+
 SPDX_PACKAGE_VERSION ??= "${PV}"
 SPDX_PACKAGE_VERSION[doc] = "The version of a package, software_packageVersion \
     in software_Package"
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 68ed821a8c..51e10befba 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -1449,6 +1449,16 @@ def create_image_sbom_spdx(d):
 
     objset, sbom = oe.sbom30.create_sbom(d, image_name, root_elements)
 
+    # Set supplier on root elements if SPDX_IMAGE_SUPPLIER is defined
+    supplier = objset.new_agent("SPDX_IMAGE_SUPPLIER", add=False)
+    if supplier is not None:
+        supplier_id = supplier if isinstance(supplier, str) else supplier._id
+        if not isinstance(supplier, str):
+            objset.add(supplier)
+        for elem in sbom.rootElement:
+            if hasattr(elem, "suppliedBy"):
+                elem.suppliedBy = supplier_id
+
     oe.sbom30.write_jsonld_doc(d, objset, spdx_path)
 
     def make_image_link(target_path, suffix):
@@ -1560,6 +1570,16 @@ def create_sdk_sbom(d, sdk_deploydir, spdx_work_dir, toolchain_outputname):
         d, toolchain_outputname, sorted(list(files)), [rootfs_objset]
     )
 
+    # Set supplier on root elements if SPDX_SDK_SUPPLIER is defined
+    supplier = objset.new_agent("SPDX_SDK_SUPPLIER", add=False)
+    if supplier is not None:
+        supplier_id = supplier if isinstance(supplier, str) else supplier._id
+        if not isinstance(supplier, str):
+            objset.add(supplier)
+        for elem in sbom.rootElement:
+            if hasattr(elem, "suppliedBy"):
+                elem.suppliedBy = supplier_id
+
     oe.sbom30.write_jsonld_doc(
         d, objset, sdk_deploydir / (toolchain_outputname + ".spdx.json")
     )
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v16 3/5] spdx30: Add ecosystem PURLs for recipe classes
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (6 preceding siblings ...)
  2026-03-24 17:12   ` [PATCH v16 2/5] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
@ 2026-03-24 17:12   ` Stefano Tondo
  2026-03-24 17:12   ` [PATCH v16 4/5] spdx30: Add Git version and PURL to source downloads Stefano Tondo
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Stefano Tondo @ 2026-03-24 17:12 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Add SPDX_PACKAGE_URLS to recipe classes to generate ecosystem-specific
Package URLs for SPDX 3.0 SBOMs. This enables proper package
identification across different packaging ecosystems.

Classes updated:
- cargo_common.bbclass: pkg:cargo PURLs for Rust crates
- cpan.bbclass: pkg:cpan PURLs for Perl modules (with name normalization)
- go-mod.bbclass: pkg:golang PURLs for Go modules
- npm.bbclass: pkg:npm PURLs for Node.js packages (with name normalization)
- pypi.bbclass: pkg:pypi PURLs for Python packages (with name normalization)

The SPDX_PACKAGE_URLS variable is a space-separated list which
create-spdx-3.0 already reads via set_purls() to populate
software_packageUrl and externalIdentifier on recipe packages.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes-recipe/cargo_common.bbclass |  3 +++
 meta/classes-recipe/cpan.bbclass         | 11 +++++++++++
 meta/classes-recipe/go-mod.bbclass       |  3 +++
 meta/classes-recipe/npm.bbclass          |  7 +++++++
 meta/classes-recipe/pypi.bbclass         |  3 +++
 5 files changed, 27 insertions(+)

diff --git a/meta/classes-recipe/cargo_common.bbclass b/meta/classes-recipe/cargo_common.bbclass
index bc44ad7918..0d3edfe4a7 100644
--- a/meta/classes-recipe/cargo_common.bbclass
+++ b/meta/classes-recipe/cargo_common.bbclass
@@ -240,3 +240,6 @@ EXPORT_FUNCTIONS do_configure
 # https://github.com/rust-lang/libc/issues/3223
 # https://github.com/rust-lang/libc/pull/3175
 INSANE_SKIP:append = " 32bit-time"
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:cargo/${BPN}@${PV} "
diff --git a/meta/classes-recipe/cpan.bbclass b/meta/classes-recipe/cpan.bbclass
index bb76a5b326..dbf44da9d2 100644
--- a/meta/classes-recipe/cpan.bbclass
+++ b/meta/classes-recipe/cpan.bbclass
@@ -68,4 +68,15 @@ cpan_do_install () {
 	done
 }
 
+# Generate ecosystem-specific Package URL for SPDX
+def cpan_spdx_name(d):
+    bpn = d.getVar('BPN')
+    if bpn.startswith('perl-'):
+        return bpn[5:]
+    elif bpn.startswith('libperl-'):
+        return bpn[8:]
+    return bpn
+
+SPDX_PACKAGE_URLS =+ "pkg:cpan/${@cpan_spdx_name(d)}@${PV} "
+
 EXPORT_FUNCTIONS do_configure do_compile do_install
diff --git a/meta/classes-recipe/go-mod.bbclass b/meta/classes-recipe/go-mod.bbclass
index a15dda8f0e..0f5835f26e 100644
--- a/meta/classes-recipe/go-mod.bbclass
+++ b/meta/classes-recipe/go-mod.bbclass
@@ -32,3 +32,6 @@ do_compile[dirs] += "${B}/src/${GO_WORKDIR}"
 # Make go install unpack the module zip files in the module cache directory
 # before the license directory is polulated with license files.
 addtask do_compile before do_populate_lic
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:golang/${GO_IMPORT}@${PV} "
diff --git a/meta/classes-recipe/npm.bbclass b/meta/classes-recipe/npm.bbclass
index 344e8b4bec..7bb791d543 100644
--- a/meta/classes-recipe/npm.bbclass
+++ b/meta/classes-recipe/npm.bbclass
@@ -354,4 +354,11 @@ FILES:${PN} += " \
     ${nonarch_libdir} \
 "
 
+# Generate ecosystem-specific Package URL for SPDX
+def npm_spdx_name(d):
+    bpn = d.getVar('BPN')
+    return bpn[5:] if bpn.startswith('node-') else bpn
+
+SPDX_PACKAGE_URLS =+ "pkg:npm/${@npm_spdx_name(d)}@${PV} "
+
 EXPORT_FUNCTIONS do_configure do_compile do_install
diff --git a/meta/classes-recipe/pypi.bbclass b/meta/classes-recipe/pypi.bbclass
index 9d46c035f6..bd21557c60 100644
--- a/meta/classes-recipe/pypi.bbclass
+++ b/meta/classes-recipe/pypi.bbclass
@@ -54,3 +54,6 @@ UPSTREAM_CHECK_URI ?= "https://pypi.org/simple/${@pypi_normalize(d)}/"
 UPSTREAM_CHECK_REGEX ?= "${UPSTREAM_CHECK_PYPI_PACKAGE}-(?P<pver>(\d+[\.\-_]*)+).(tar\.gz|tgz|zip|tar\.bz2)"
 
 CVE_PRODUCT ?= "python:${PYPI_PACKAGE}"
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:pypi/${@pypi_normalize(d)}@${PV} "
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v16 4/5] spdx30: Add Git version and PURL to source downloads
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (7 preceding siblings ...)
  2026-03-24 17:12   ` [PATCH v16 3/5] spdx30: Add ecosystem PURLs for recipe classes Stefano Tondo
@ 2026-03-24 17:12   ` Stefano Tondo
  2026-03-26 20:14     ` Joshua Watt
  2026-03-24 17:12   ` [PATCH v16 5/5] oeqa/selftest: Add tests for source download enrichment Stefano Tondo
                     ` (6 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Stefano Tondo @ 2026-03-24 17:12 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Enrich Git source download packages in the SPDX 3.0 output with:
- software_packageVersion set to the full SHA-1 commit hash
- software_packageUrl set to a PURL for known Git hosting services
- VCS external reference pointing to the repository URL

The PURL generation recognizes github.com by default and supports
additional hosting services via the SPDX_GIT_PURL_MAPPINGS variable
(format: 'domain:purl_type', e.g. 'gitlab.example.com:pkg:gitlab').

Only Git source downloads are enriched. Non-Git downloads are left
unchanged since their ecosystem PURLs are already set on the recipe
package by SPDX_PACKAGE_URLS from the previous patch.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes/create-spdx-3.0.bbclass |   7 ++
 meta/lib/oe/spdx30_tasks.py          | 102 +++++++++++++++++++++++++++
 2 files changed, 109 insertions(+)

diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
index 9a6606dce6..432adb14cd 100644
--- a/meta/classes/create-spdx-3.0.bbclass
+++ b/meta/classes/create-spdx-3.0.bbclass
@@ -156,6 +156,13 @@ SPDX_RECIPE_SBOM_NAME ?= "${PN}-recipe-sbom"
 SPDX_RECIPE_SBOM_NAME[doc] = "The name of output recipe SBoM when using \
     create_recipe_sbom"
 
+SPDX_GIT_PURL_MAPPINGS ??= ""
+SPDX_GIT_PURL_MAPPINGS[doc] = "A space separated list of domain:purl_type \
+    mappings to configure PURL generation for Git source downloads. \
+    For example, 'gitlab.example.com:pkg:gitlab' maps repositories hosted \
+    on gitlab.example.com to the pkg:gitlab PURL type. \
+    github.com is always mapped to pkg:github by default."
+
 IMAGE_CLASSES:append = " create-spdx-image-3.0"
 SDK_CLASSES += "create-spdx-sdk-3.0"
 
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 51e10befba..cd9672c18e 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -14,6 +14,7 @@ import oe.spdx_common
 import oe.sdk
 import os
 import re
+import urllib.parse
 
 from contextlib import contextmanager
 from datetime import datetime, timezone
@@ -384,6 +385,105 @@ def collect_dep_sources(dep_objsets, dest):
             index_sources_by_hash(e.to, dest)
 
 
+
+def _generate_git_purl(d, download_location, srcrev):
+    """Generate a Package URL for a Git source from its download location.
+
+    Parses the Git URL to identify the hosting service and generates the
+    appropriate PURL type. Supports github.com by default and custom
+    mappings via SPDX_GIT_PURL_MAPPINGS.
+
+    Returns the PURL string or None if no mapping matches.
+    """
+    if not download_location or not download_location.startswith('git+'):
+        return None
+
+    git_url = download_location[4:]  # Remove 'git+' prefix
+
+    # Default handler: github.com
+    git_purl_handlers = {
+        'github.com': 'pkg:github',
+    }
+
+    # Custom PURL mappings from SPDX_GIT_PURL_MAPPINGS
+    # Format: "domain1:purl_type1 domain2:purl_type2"
+    custom_mappings = d.getVar('SPDX_GIT_PURL_MAPPINGS')
+    if custom_mappings:
+        for mapping in custom_mappings.split():
+            parts = mapping.split(':', 1)
+            if len(parts) == 2:
+                git_purl_handlers[parts[0]] = parts[1]
+                bb.debug(2, f"Added custom Git PURL mapping: {parts[0]} -> {parts[1]}")
+            else:
+                bb.warn(f"Invalid SPDX_GIT_PURL_MAPPINGS entry: {mapping} (expected format: domain:purl_type)")
+
+    try:
+        parsed = urllib.parse.urlparse(git_url)
+    except Exception:
+        return None
+
+    hostname = parsed.hostname
+    if not hostname:
+        return None
+
+    for domain, purl_type in git_purl_handlers.items():
+        if hostname == domain:
+            path = parsed.path.strip('/')
+            path_parts = path.split('/')
+            if len(path_parts) >= 2:
+                owner = path_parts[0]
+                repo = path_parts[1].replace('.git', '')
+                return f"{purl_type}/{owner}/{repo}@{srcrev}"
+            break
+
+    return None
+
+
+def _enrich_source_package(d, dl, fd, file_name, primary_purpose):
+    """Enrich a Git source download package with version, PURL, and external refs.
+
+    For Git sources, extracts the full SHA-1 from SRCREV as the version,
+    generates PURLs for known hosting services, and adds VCS external
+    references.
+    """
+    version = None
+    purl = None
+
+    if fd.type == "git":
+        # Use full SHA-1 from fd.revision
+        srcrev = getattr(fd, 'revision', None)
+        if srcrev and srcrev not in {'${AUTOREV}', 'AUTOINC', 'INVALID'}:
+            version = srcrev
+
+        # Generate PURL for Git hosting services
+        download_location = getattr(dl, 'software_downloadLocation', None)
+        if version and download_location:
+            purl = _generate_git_purl(d, download_location, version)
+
+    if version:
+        dl.software_packageVersion = version
+
+    if purl:
+        dl.software_packageUrl = purl
+
+    # Add VCS external reference for Git repositories
+    download_location = getattr(dl, 'software_downloadLocation', None)
+    if download_location and isinstance(download_location, str):
+        if download_location.startswith('git+'):
+            git_url = download_location[4:]
+            if '@' in git_url:
+                git_url = git_url.split('@')[0]
+
+            dl.externalRef = dl.externalRef or []
+            dl.externalRef.append(
+                oe.spdx30.ExternalRef(
+                    externalRefType=oe.spdx30.ExternalRefType.vcs,
+                    locator=[git_url],
+                )
+            )
+
+
+
 def add_download_files(d, objset):
     inputs = set()
 
@@ -447,6 +547,8 @@ def add_download_files(d, objset):
                 )
             )
 
+            _enrich_source_package(d, dl, fd, file_name, primary_purpose)
+
             if fd.method.supports_checksum(fd):
                 # TODO Need something better than hard coding this
                 for checksum_id in ["sha256", "sha1"]:
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v16 5/5] oeqa/selftest: Add tests for source download enrichment
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (8 preceding siblings ...)
  2026-03-24 17:12   ` [PATCH v16 4/5] spdx30: Add Git version and PURL to source downloads Stefano Tondo
@ 2026-03-24 17:12   ` Stefano Tondo
  2026-03-24 17:14   ` [PATCH v16 0/5] spdx30: PURL and " Stefano Tondo
                     ` (5 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Stefano Tondo @ 2026-03-24 17:12 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Add two new test methods to SPDX30Check:

test_download_location_defensive_handling:
  Builds m4 and verifies that SPDX generation succeeds and any
  external references present are properly structured with valid
  types and locator strings.

test_version_extraction_patterns:
  Builds opkg-utils (a Git-based recipe) and verifies that source
  download packages carry the full SHA-1 commit hash as their
  software_packageVersion.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/lib/oeqa/selftest/cases/spdx.py | 76 ++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/meta/lib/oeqa/selftest/cases/spdx.py b/meta/lib/oeqa/selftest/cases/spdx.py
index af1144c1e5..9347e0bf7b 100644
--- a/meta/lib/oeqa/selftest/cases/spdx.py
+++ b/meta/lib/oeqa/selftest/cases/spdx.py
@@ -428,3 +428,79 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
                 value, ["enabled", "disabled"],
                 f"Unexpected PACKAGECONFIG value '{value}' for {key}"
             )
+
+    def test_download_location_defensive_handling(self):
+        """Test that download_location handling is defensive.
+
+        Verifies SPDX generation succeeds and external references are
+        properly structured when download_location retrieval works.
+        """
+        objset = self.check_recipe_spdx(
+            "m4",
+            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-m4.spdx.json",
+        )
+
+        found_external_refs = False
+        for pkg in objset.foreach_type(oe.spdx30.software_Package):
+            if pkg.externalRef:
+                found_external_refs = True
+                for ref in pkg.externalRef:
+                    self.assertIsNotNone(ref.externalRefType)
+                    self.assertIsNotNone(ref.locator)
+                    self.assertGreater(len(ref.locator), 0, "Locator should have at least one entry")
+                    for loc in ref.locator:
+                        self.assertIsInstance(loc, str)
+                break
+
+        self.logger.info(
+            f"External references {'found' if found_external_refs else 'not found'} "
+            f"in SPDX output (defensive handling verified)"
+        )
+
+    def test_version_extraction_patterns(self):
+        """Test that version extraction works for various package formats.
+
+        Verifies that Git source downloads carry extracted versions and that
+        the reported version strings are well-formed.
+        """
+        objset = self.check_recipe_spdx(
+            "opkg-utils",
+            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-opkg-utils.spdx.json",
+        )
+
+        # Collect all packages with versions
+        packages_with_versions = []
+        for pkg in objset.foreach_type(oe.spdx30.software_Package):
+            if pkg.software_packageVersion:
+                packages_with_versions.append((pkg.name, pkg.software_packageVersion))
+
+        self.assertGreater(
+            len(packages_with_versions), 0,
+            "Should find packages with extracted versions"
+        )
+
+        for name, version in packages_with_versions:
+            self.assertRegex(
+                version,
+                r"^[0-9a-f]{40}$",
+                f"Expected Git source version for {name} to be a full SHA-1",
+            )
+
+        self.logger.info(f"Found {len(packages_with_versions)} packages with versions")
+
+        # Log some examples for debugging
+        for name, version in packages_with_versions[:5]:
+            self.logger.info(f"  {name}: {version}")
+
+        # Verify that versions follow expected patterns
+        for name, version in packages_with_versions:
+            # Version should not be empty
+            self.assertIsNotNone(version)
+            self.assertNotEqual(version, "")
+
+            # Version should contain digits
+            self.assertRegex(
+                version,
+                r'\d',
+                f"Version '{version}' for package '{name}' should contain digits"
+            )
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v16 0/5] spdx30: PURL and source download enrichment
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (9 preceding siblings ...)
  2026-03-24 17:12   ` [PATCH v16 5/5] oeqa/selftest: Add tests for source download enrichment Stefano Tondo
@ 2026-03-24 17:14   ` Stefano Tondo
  2026-03-24 17:14   ` [PATCH v16 1/5] spdx30: Add configurable file exclusion pattern support Stefano Tondo
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Stefano Tondo @ 2026-03-24 17:14 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

This series enhances Yocto's SPDX 3.0 output with ecosystem-specific
Package URLs, Git source version tracking, and configurable SBOM
generation improvements.

Changes since v14 (addressing Joshua's review of v14 patches 3/4 and 4/4):
- Split the monolithic "Enrich source downloads" patch into two focused
  patches: ecosystem PURLs (patch 3, bbclass-only) and Git download
  enrichment (patch 4, spdx30_tasks.py additions-only)
- Removed ALL extraneous changes: formatting, comment removals,
  blank-line deletions, variable removals, and off-topic refactors
- Fixed go-mod.bbclass duplicate SPDX_PACKAGE_URLS line
- Reverted pypi.bbclass UPSTREAM_CHECK_PYPI_PACKAGE change
- Removed the else-branch that copied recipe ecosystem PURLs to
  non-git download files (per Joshua's feedback)
- Reverted do_create_spdx -> do_create_package_spdx change in
  collect_build_package_inputs
- Reverted bb.note -> bb.fatal for missing SPDX providers
- Restored all removed TODO comments and blank lines
- Patches 2-5 are now strictly additions-only (0 deletions)
- Tests unchanged (additions-only, all 12 master tests preserved)

Stefano Tondo (5):
  spdx30: Add configurable file exclusion pattern support
  spdx30: Add supplier support for image and SDK SBOMs
  spdx30: Add ecosystem PURLs for recipe classes
  spdx30: Add Git version and PURL to source downloads
  oeqa/selftest: Add tests for source download enrichment

 meta/classes-recipe/cargo_common.bbclass |   3 +
 meta/classes-recipe/cpan.bbclass         |  11 ++
 meta/classes-recipe/go-mod.bbclass       |   3 +
 meta/classes-recipe/npm.bbclass          |   7 +
 meta/classes-recipe/pypi.bbclass         |   3 +
 meta/classes/create-spdx-3.0.bbclass     |  17 ++
 meta/classes/spdx-common.bbclass         |   7 +
 meta/lib/oe/spdx30_tasks.py              | 202 ++++++++++++++++++++---
 meta/lib/oeqa/selftest/cases/spdx.py     |  76 +++++++++
 9 files changed, 302 insertions(+), 27 deletions(-)

-- 
2.53.0



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v16 1/5] spdx30: Add configurable file exclusion pattern support
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (10 preceding siblings ...)
  2026-03-24 17:14   ` [PATCH v16 0/5] spdx30: PURL and " Stefano Tondo
@ 2026-03-24 17:14   ` Stefano Tondo
  2026-03-26 20:11     ` Joshua Watt
  2026-03-24 17:14   ` [PATCH v16 2/5] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
                     ` (3 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Stefano Tondo @ 2026-03-24 17:14 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Add SPDX_FILE_EXCLUDE_PATTERNS variable that allows filtering files from
SPDX output by regex matching. The variable accepts a space-separated
list of Python regular expressions; files whose paths match any pattern
(via re.search) are excluded.

When empty (the default), no filtering is applied and all files are
included, preserving existing behavior.

This enables users to reduce SBOM size by excluding files that are not
relevant for compliance (e.g., test files, object files, patches).

Excluded files are tracked in a set returned from add_package_files()
and passed to get_package_sources_from_debug(), which uses the set for
precise cross-checking rather than re-evaluating patterns.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes/spdx-common.bbclass |  7 +++
 meta/lib/oe/spdx30_tasks.py      | 80 +++++++++++++++++++++-----------
 2 files changed, 60 insertions(+), 27 deletions(-)

diff --git a/meta/classes/spdx-common.bbclass b/meta/classes/spdx-common.bbclass
index 83f05579b6..40701730a6 100644
--- a/meta/classes/spdx-common.bbclass
+++ b/meta/classes/spdx-common.bbclass
@@ -82,6 +82,13 @@ SPDX_MULTILIB_SSTATE_ARCHS[doc] = "The list of sstate architectures to consider
     when collecting SPDX dependencies. This includes multilib architectures when \
     multilib is enabled. Defaults to SSTATE_ARCHS."
 
+SPDX_FILE_EXCLUDE_PATTERNS ??= ""
+SPDX_FILE_EXCLUDE_PATTERNS[doc] = "Space-separated list of Python regular \
+    expressions to exclude files from SPDX output. Files whose paths match \
+    any pattern (via re.search) will be filtered out. Defaults to empty \
+    (no filtering). Example: \
+    SPDX_FILE_EXCLUDE_PATTERNS = '\\.patch$ \\.diff$ /test/ \\.pyc$ \\.o$'"
+
 python () {
     from oe.cve_check import extend_cve_status
     extend_cve_status(d)
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 353d783fa2..68ed821a8c 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -13,6 +13,7 @@ import oe.spdx30
 import oe.spdx_common
 import oe.sdk
 import os
+import re
 
 from contextlib import contextmanager
 from datetime import datetime, timezone
@@ -157,17 +158,27 @@ def add_package_files(
     file_counter = 1
     if not os.path.exists(topdir):
         bb.note(f"Skip {topdir}")
-        return spdx_files
+        return spdx_files, set()
 
     check_compiled_sources = d.getVar("SPDX_INCLUDE_COMPILED_SOURCES") == "1"
     if check_compiled_sources:
         compiled_sources, types = oe.spdx_common.get_compiled_sources(d)
         bb.debug(1, f"Total compiled files: {len(compiled_sources)}")
 
+    exclude_patterns = [
+        re.compile(pattern)
+        for pattern in (d.getVar("SPDX_FILE_EXCLUDE_PATTERNS") or "").split()
+    ]
+    excluded_files = set()
+
     for subdir, dirs, files in os.walk(topdir, onerror=walk_error):
-        dirs[:] = [d for d in dirs if d not in ignore_dirs]
+        dirs[:] = [directory for directory in dirs if directory not in ignore_dirs]
         if subdir == str(topdir):
-            dirs[:] = [d for d in dirs if d not in ignore_top_level_dirs]
+            dirs[:] = [
+                directory
+                for directory in dirs
+                if directory not in ignore_top_level_dirs
+            ]
 
         dirs.sort()
         files.sort()
@@ -177,14 +188,19 @@ def add_package_files(
                 continue
 
             filename = str(filepath.relative_to(topdir))
+
+            if exclude_patterns and any(
+                pattern.search(filename) for pattern in exclude_patterns
+            ):
+                excluded_files.add(filename)
+                continue
+
             file_purposes = get_purposes(filepath)
 
-            # Check if file is compiled
-            if check_compiled_sources:
-                if not oe.spdx_common.is_compiled_source(
-                    filename, compiled_sources, types
-                ):
-                    continue
+            if check_compiled_sources and not oe.spdx_common.is_compiled_source(
+                filename, compiled_sources, types
+            ):
+                continue
 
             spdx_file = objset.new_file(
                 get_spdxid(file_counter),
@@ -218,12 +234,15 @@ def add_package_files(
 
     bb.debug(1, "Added %d files to %s" % (len(spdx_files), objset.doc._id))
 
-    return spdx_files
+    return spdx_files, excluded_files
 
 
 def get_package_sources_from_debug(
-    d, package, package_files, sources, source_hash_cache
+    d, package, package_files, sources, source_hash_cache, excluded_files=None
 ):
+    if excluded_files is None:
+        excluded_files = set()
+
     def file_path_match(file_path, pkg_file):
         if file_path.lstrip("/") == pkg_file.name.lstrip("/"):
             return True
@@ -256,6 +275,12 @@ def get_package_sources_from_debug(
             continue
 
         if not any(file_path_match(file_path, pkg_file) for pkg_file in package_files):
+            if file_path.lstrip("/") in excluded_files:
+                bb.debug(
+                    1,
+                    f"Skipping debug source lookup for excluded file {file_path} in {package}",
+                )
+                continue
             bb.fatal(
                 "No package file found for %s in %s; SPDX found: %s"
                 % (str(file_path), package, " ".join(p.name for p in package_files))
@@ -737,7 +762,7 @@ def create_spdx(d):
         bb.debug(1, "Adding source files to SPDX")
         oe.spdx_common.get_patched_src(d)
 
-        files = add_package_files(
+        files, _ = add_package_files(
             d,
             build_objset,
             spdx_workdir,
@@ -909,7 +934,7 @@ def create_spdx(d):
                 )
 
             bb.debug(1, "Adding package files to SPDX for package %s" % pkg_name)
-            package_files = add_package_files(
+            package_files, excluded_files = add_package_files(
                 d,
                 pkg_objset,
                 pkgdest / package,
@@ -932,7 +957,8 @@ def create_spdx(d):
 
             if include_sources:
                 debug_sources = get_package_sources_from_debug(
-                    d, package, package_files, dep_sources, source_hash_cache
+                    d, package, package_files, dep_sources, source_hash_cache,
+                    excluded_files=excluded_files,
                 )
                 debug_source_ids |= set(
                     oe.sbom30.get_element_link_id(d) for d in debug_sources
@@ -944,7 +970,7 @@ def create_spdx(d):
 
     if include_sources:
         bb.debug(1, "Adding sysroot files to SPDX")
-        sysroot_files = add_package_files(
+        sysroot_files, _ = add_package_files(
             d,
             build_objset,
             d.expand("${COMPONENTS_DIR}/${PACKAGE_ARCH}/${PN}"),
@@ -1326,18 +1352,18 @@ def create_image_spdx(d):
             image_filename = image["filename"]
             image_path = image_deploy_dir / image_filename
             if os.path.isdir(image_path):
-                a = add_package_files(
-                    d,
-                    objset,
-                    image_path,
-                    lambda file_counter: objset.new_spdxid(
-                        "imagefile", str(file_counter)
-                    ),
-                    lambda filepath: [],
-                    license_data=None,
-                    ignore_dirs=[],
-                    ignore_top_level_dirs=[],
-                    archive=None,
+                a, _ = add_package_files(
+                        d,
+                        objset,
+                        image_path,
+                        lambda file_counter: objset.new_spdxid(
+                            "imagefile", str(file_counter)
+                        ),
+                        lambda filepath: [],
+                        license_data=None,
+                        ignore_dirs=[],
+                        ignore_top_level_dirs=[],
+                        archive=None,
                 )
                 artifacts.extend(a)
             else:
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v16 2/5] spdx30: Add supplier support for image and SDK SBOMs
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (11 preceding siblings ...)
  2026-03-24 17:14   ` [PATCH v16 1/5] spdx30: Add configurable file exclusion pattern support Stefano Tondo
@ 2026-03-24 17:14   ` Stefano Tondo
  2026-03-26 20:12     ` Joshua Watt
  2026-03-24 17:15   ` [PATCH v16 3/5] spdx30: Add ecosystem PURLs for recipe classes Stefano Tondo
                     ` (2 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Stefano Tondo @ 2026-03-24 17:14 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Add SPDX_IMAGE_SUPPLIER and SPDX_SDK_SUPPLIER variables that allow
setting a supplier agent on image and SDK SBOM root elements using
the suppliedBy property.

These follow the existing SPDX_PACKAGE_SUPPLIER pattern and use the
standard agent variable system to define supplier information.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes/create-spdx-3.0.bbclass | 10 ++++++++++
 meta/lib/oe/spdx30_tasks.py          | 20 ++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
index 7515f460c3..9a6606dce6 100644
--- a/meta/classes/create-spdx-3.0.bbclass
+++ b/meta/classes/create-spdx-3.0.bbclass
@@ -124,6 +124,16 @@ SPDX_ON_BEHALF_OF[doc] = "The base variable name to describe the Agent on who's
 SPDX_PACKAGE_SUPPLIER[doc] = "The base variable name to describe the Agent who \
     is supplying artifacts produced by the build"
 
+SPDX_IMAGE_SUPPLIER[doc] = "The base variable name to describe the Agent who \
+    is supplying the image SBOM. The supplier will be set on all root elements \
+    of the image SBOM using the suppliedBy property. If not set, no supplier \
+    information will be added to the image SBOM."
+
+SPDX_SDK_SUPPLIER[doc] = "The base variable name to describe the Agent who \
+    is supplying the SDK SBOM. The supplier will be set on all root elements \
+    of the SDK SBOM using the suppliedBy property. If not set, no supplier \
+    information will be added to the SDK SBOM."
+
 SPDX_PACKAGE_VERSION ??= "${PV}"
 SPDX_PACKAGE_VERSION[doc] = "The version of a package, software_packageVersion \
     in software_Package"
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 68ed821a8c..51e10befba 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -1449,6 +1449,16 @@ def create_image_sbom_spdx(d):
 
     objset, sbom = oe.sbom30.create_sbom(d, image_name, root_elements)
 
+    # Set supplier on root elements if SPDX_IMAGE_SUPPLIER is defined
+    supplier = objset.new_agent("SPDX_IMAGE_SUPPLIER", add=False)
+    if supplier is not None:
+        supplier_id = supplier if isinstance(supplier, str) else supplier._id
+        if not isinstance(supplier, str):
+            objset.add(supplier)
+        for elem in sbom.rootElement:
+            if hasattr(elem, "suppliedBy"):
+                elem.suppliedBy = supplier_id
+
     oe.sbom30.write_jsonld_doc(d, objset, spdx_path)
 
     def make_image_link(target_path, suffix):
@@ -1560,6 +1570,16 @@ def create_sdk_sbom(d, sdk_deploydir, spdx_work_dir, toolchain_outputname):
         d, toolchain_outputname, sorted(list(files)), [rootfs_objset]
     )
 
+    # Set supplier on root elements if SPDX_SDK_SUPPLIER is defined
+    supplier = objset.new_agent("SPDX_SDK_SUPPLIER", add=False)
+    if supplier is not None:
+        supplier_id = supplier if isinstance(supplier, str) else supplier._id
+        if not isinstance(supplier, str):
+            objset.add(supplier)
+        for elem in sbom.rootElement:
+            if hasattr(elem, "suppliedBy"):
+                elem.suppliedBy = supplier_id
+
     oe.sbom30.write_jsonld_doc(
         d, objset, sdk_deploydir / (toolchain_outputname + ".spdx.json")
     )
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v16 3/5] spdx30: Add ecosystem PURLs for recipe classes
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (12 preceding siblings ...)
  2026-03-24 17:14   ` [PATCH v16 2/5] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
@ 2026-03-24 17:15   ` Stefano Tondo
  2026-03-26 20:13     ` Joshua Watt
  2026-03-24 17:15   ` [PATCH v16 4/5] spdx30: Add Git version and PURL to source downloads Stefano Tondo
  2026-03-24 17:15   ` [PATCH v16 5/5] oeqa/selftest: Add tests for source download enrichment Stefano Tondo
  15 siblings, 1 reply; 32+ messages in thread
From: Stefano Tondo @ 2026-03-24 17:15 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Add SPDX_PACKAGE_URLS to recipe classes to generate ecosystem-specific
Package URLs for SPDX 3.0 SBOMs. This enables proper package
identification across different packaging ecosystems.

Classes updated:
- cargo_common.bbclass: pkg:cargo PURLs for Rust crates
- cpan.bbclass: pkg:cpan PURLs for Perl modules (with name normalization)
- go-mod.bbclass: pkg:golang PURLs for Go modules
- npm.bbclass: pkg:npm PURLs for Node.js packages (with name normalization)
- pypi.bbclass: pkg:pypi PURLs for Python packages (with name normalization)

The SPDX_PACKAGE_URLS variable is a space-separated list which
create-spdx-3.0 already reads via set_purls() to populate
software_packageUrl and externalIdentifier on recipe packages.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes-recipe/cargo_common.bbclass |  3 +++
 meta/classes-recipe/cpan.bbclass         | 11 +++++++++++
 meta/classes-recipe/go-mod.bbclass       |  3 +++
 meta/classes-recipe/npm.bbclass          |  7 +++++++
 meta/classes-recipe/pypi.bbclass         |  3 +++
 5 files changed, 27 insertions(+)

diff --git a/meta/classes-recipe/cargo_common.bbclass b/meta/classes-recipe/cargo_common.bbclass
index bc44ad7918..0d3edfe4a7 100644
--- a/meta/classes-recipe/cargo_common.bbclass
+++ b/meta/classes-recipe/cargo_common.bbclass
@@ -240,3 +240,6 @@ EXPORT_FUNCTIONS do_configure
 # https://github.com/rust-lang/libc/issues/3223
 # https://github.com/rust-lang/libc/pull/3175
 INSANE_SKIP:append = " 32bit-time"
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:cargo/${BPN}@${PV} "
diff --git a/meta/classes-recipe/cpan.bbclass b/meta/classes-recipe/cpan.bbclass
index bb76a5b326..dbf44da9d2 100644
--- a/meta/classes-recipe/cpan.bbclass
+++ b/meta/classes-recipe/cpan.bbclass
@@ -68,4 +68,15 @@ cpan_do_install () {
 	done
 }
 
+# Generate ecosystem-specific Package URL for SPDX
+def cpan_spdx_name(d):
+    bpn = d.getVar('BPN')
+    if bpn.startswith('perl-'):
+        return bpn[5:]
+    elif bpn.startswith('libperl-'):
+        return bpn[8:]
+    return bpn
+
+SPDX_PACKAGE_URLS =+ "pkg:cpan/${@cpan_spdx_name(d)}@${PV} "
+
 EXPORT_FUNCTIONS do_configure do_compile do_install
diff --git a/meta/classes-recipe/go-mod.bbclass b/meta/classes-recipe/go-mod.bbclass
index a15dda8f0e..0f5835f26e 100644
--- a/meta/classes-recipe/go-mod.bbclass
+++ b/meta/classes-recipe/go-mod.bbclass
@@ -32,3 +32,6 @@ do_compile[dirs] += "${B}/src/${GO_WORKDIR}"
 # Make go install unpack the module zip files in the module cache directory
 # before the license directory is polulated with license files.
 addtask do_compile before do_populate_lic
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:golang/${GO_IMPORT}@${PV} "
diff --git a/meta/classes-recipe/npm.bbclass b/meta/classes-recipe/npm.bbclass
index 344e8b4bec..7bb791d543 100644
--- a/meta/classes-recipe/npm.bbclass
+++ b/meta/classes-recipe/npm.bbclass
@@ -354,4 +354,11 @@ FILES:${PN} += " \
     ${nonarch_libdir} \
 "
 
+# Generate ecosystem-specific Package URL for SPDX
+def npm_spdx_name(d):
+    bpn = d.getVar('BPN')
+    return bpn[5:] if bpn.startswith('node-') else bpn
+
+SPDX_PACKAGE_URLS =+ "pkg:npm/${@npm_spdx_name(d)}@${PV} "
+
 EXPORT_FUNCTIONS do_configure do_compile do_install
diff --git a/meta/classes-recipe/pypi.bbclass b/meta/classes-recipe/pypi.bbclass
index 9d46c035f6..bd21557c60 100644
--- a/meta/classes-recipe/pypi.bbclass
+++ b/meta/classes-recipe/pypi.bbclass
@@ -54,3 +54,6 @@ UPSTREAM_CHECK_URI ?= "https://pypi.org/simple/${@pypi_normalize(d)}/"
 UPSTREAM_CHECK_REGEX ?= "${UPSTREAM_CHECK_PYPI_PACKAGE}-(?P<pver>(\d+[\.\-_]*)+).(tar\.gz|tgz|zip|tar\.bz2)"
 
 CVE_PRODUCT ?= "python:${PYPI_PACKAGE}"
+
+# Generate ecosystem-specific Package URL for SPDX
+SPDX_PACKAGE_URLS =+ "pkg:pypi/${@pypi_normalize(d)}@${PV} "
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v16 4/5] spdx30: Add Git version and PURL to source downloads
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (13 preceding siblings ...)
  2026-03-24 17:15   ` [PATCH v16 3/5] spdx30: Add ecosystem PURLs for recipe classes Stefano Tondo
@ 2026-03-24 17:15   ` Stefano Tondo
  2026-03-24 17:15   ` [PATCH v16 5/5] oeqa/selftest: Add tests for source download enrichment Stefano Tondo
  15 siblings, 0 replies; 32+ messages in thread
From: Stefano Tondo @ 2026-03-24 17:15 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Enrich Git source download packages in the SPDX 3.0 output with:
- software_packageVersion set to the full SHA-1 commit hash
- software_packageUrl set to a PURL for known Git hosting services
- VCS external reference pointing to the repository URL

The PURL generation recognizes github.com by default and supports
additional hosting services via the SPDX_GIT_PURL_MAPPINGS variable
(format: 'domain:purl_type', e.g. 'gitlab.example.com:pkg:gitlab').

Only Git source downloads are enriched. Non-Git downloads are left
unchanged since their ecosystem PURLs are already set on the recipe
package by SPDX_PACKAGE_URLS from the previous patch.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes/create-spdx-3.0.bbclass |   7 ++
 meta/lib/oe/spdx30_tasks.py          | 102 +++++++++++++++++++++++++++
 2 files changed, 109 insertions(+)

diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
index 9a6606dce6..432adb14cd 100644
--- a/meta/classes/create-spdx-3.0.bbclass
+++ b/meta/classes/create-spdx-3.0.bbclass
@@ -156,6 +156,13 @@ SPDX_RECIPE_SBOM_NAME ?= "${PN}-recipe-sbom"
 SPDX_RECIPE_SBOM_NAME[doc] = "The name of output recipe SBoM when using \
     create_recipe_sbom"
 
+SPDX_GIT_PURL_MAPPINGS ??= ""
+SPDX_GIT_PURL_MAPPINGS[doc] = "A space separated list of domain:purl_type \
+    mappings to configure PURL generation for Git source downloads. \
+    For example, 'gitlab.example.com:pkg:gitlab' maps repositories hosted \
+    on gitlab.example.com to the pkg:gitlab PURL type. \
+    github.com is always mapped to pkg:github by default."
+
 IMAGE_CLASSES:append = " create-spdx-image-3.0"
 SDK_CLASSES += "create-spdx-sdk-3.0"
 
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 51e10befba..cd9672c18e 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -14,6 +14,7 @@ import oe.spdx_common
 import oe.sdk
 import os
 import re
+import urllib.parse
 
 from contextlib import contextmanager
 from datetime import datetime, timezone
@@ -384,6 +385,105 @@ def collect_dep_sources(dep_objsets, dest):
             index_sources_by_hash(e.to, dest)
 
 
+
+def _generate_git_purl(d, download_location, srcrev):
+    """Generate a Package URL for a Git source from its download location.
+
+    Parses the Git URL to identify the hosting service and generates the
+    appropriate PURL type. Supports github.com by default and custom
+    mappings via SPDX_GIT_PURL_MAPPINGS.
+
+    Returns the PURL string or None if no mapping matches.
+    """
+    if not download_location or not download_location.startswith('git+'):
+        return None
+
+    git_url = download_location[4:]  # Remove 'git+' prefix
+
+    # Default handler: github.com
+    git_purl_handlers = {
+        'github.com': 'pkg:github',
+    }
+
+    # Custom PURL mappings from SPDX_GIT_PURL_MAPPINGS
+    # Format: "domain1:purl_type1 domain2:purl_type2"
+    custom_mappings = d.getVar('SPDX_GIT_PURL_MAPPINGS')
+    if custom_mappings:
+        for mapping in custom_mappings.split():
+            parts = mapping.split(':', 1)
+            if len(parts) == 2:
+                git_purl_handlers[parts[0]] = parts[1]
+                bb.debug(2, f"Added custom Git PURL mapping: {parts[0]} -> {parts[1]}")
+            else:
+                bb.warn(f"Invalid SPDX_GIT_PURL_MAPPINGS entry: {mapping} (expected format: domain:purl_type)")
+
+    try:
+        parsed = urllib.parse.urlparse(git_url)
+    except Exception:
+        return None
+
+    hostname = parsed.hostname
+    if not hostname:
+        return None
+
+    for domain, purl_type in git_purl_handlers.items():
+        if hostname == domain:
+            path = parsed.path.strip('/')
+            path_parts = path.split('/')
+            if len(path_parts) >= 2:
+                owner = path_parts[0]
+                repo = path_parts[1].replace('.git', '')
+                return f"{purl_type}/{owner}/{repo}@{srcrev}"
+            break
+
+    return None
+
+
+def _enrich_source_package(d, dl, fd, file_name, primary_purpose):
+    """Enrich a Git source download package with version, PURL, and external refs.
+
+    For Git sources, extracts the full SHA-1 from SRCREV as the version,
+    generates PURLs for known hosting services, and adds VCS external
+    references.
+    """
+    version = None
+    purl = None
+
+    if fd.type == "git":
+        # Use full SHA-1 from fd.revision
+        srcrev = getattr(fd, 'revision', None)
+        if srcrev and srcrev not in {'${AUTOREV}', 'AUTOINC', 'INVALID'}:
+            version = srcrev
+
+        # Generate PURL for Git hosting services
+        download_location = getattr(dl, 'software_downloadLocation', None)
+        if version and download_location:
+            purl = _generate_git_purl(d, download_location, version)
+
+    if version:
+        dl.software_packageVersion = version
+
+    if purl:
+        dl.software_packageUrl = purl
+
+    # Add VCS external reference for Git repositories
+    download_location = getattr(dl, 'software_downloadLocation', None)
+    if download_location and isinstance(download_location, str):
+        if download_location.startswith('git+'):
+            git_url = download_location[4:]
+            if '@' in git_url:
+                git_url = git_url.split('@')[0]
+
+            dl.externalRef = dl.externalRef or []
+            dl.externalRef.append(
+                oe.spdx30.ExternalRef(
+                    externalRefType=oe.spdx30.ExternalRefType.vcs,
+                    locator=[git_url],
+                )
+            )
+
+
+
 def add_download_files(d, objset):
     inputs = set()
 
@@ -447,6 +547,8 @@ def add_download_files(d, objset):
                 )
             )
 
+            _enrich_source_package(d, dl, fd, file_name, primary_purpose)
+
             if fd.method.supports_checksum(fd):
                 # TODO Need something better than hard coding this
                 for checksum_id in ["sha256", "sha1"]:
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v16 5/5] oeqa/selftest: Add tests for source download enrichment
  2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
                     ` (14 preceding siblings ...)
  2026-03-24 17:15   ` [PATCH v16 4/5] spdx30: Add Git version and PURL to source downloads Stefano Tondo
@ 2026-03-24 17:15   ` Stefano Tondo
  2026-03-26 20:15     ` [OE-core] " Joshua Watt
  15 siblings, 1 reply; 32+ messages in thread
From: Stefano Tondo @ 2026-03-24 17:15 UTC (permalink / raw)
  To: openembedded-core
  Cc: richard.purdie, ross.burton, jpewhacker, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

Add two new test methods to SPDX30Check:

test_download_location_defensive_handling:
  Builds m4 and verifies that SPDX generation succeeds and any
  external references present are properly structured with valid
  types and locator strings.

test_version_extraction_patterns:
  Builds opkg-utils (a Git-based recipe) and verifies that source
  download packages carry the full SHA-1 commit hash as their
  software_packageVersion.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/lib/oeqa/selftest/cases/spdx.py | 76 ++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/meta/lib/oeqa/selftest/cases/spdx.py b/meta/lib/oeqa/selftest/cases/spdx.py
index af1144c1e5..9347e0bf7b 100644
--- a/meta/lib/oeqa/selftest/cases/spdx.py
+++ b/meta/lib/oeqa/selftest/cases/spdx.py
@@ -428,3 +428,79 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
                 value, ["enabled", "disabled"],
                 f"Unexpected PACKAGECONFIG value '{value}' for {key}"
             )
+
+    def test_download_location_defensive_handling(self):
+        """Test that download_location handling is defensive.
+
+        Verifies SPDX generation succeeds and external references are
+        properly structured when download_location retrieval works.
+        """
+        objset = self.check_recipe_spdx(
+            "m4",
+            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-m4.spdx.json",
+        )
+
+        found_external_refs = False
+        for pkg in objset.foreach_type(oe.spdx30.software_Package):
+            if pkg.externalRef:
+                found_external_refs = True
+                for ref in pkg.externalRef:
+                    self.assertIsNotNone(ref.externalRefType)
+                    self.assertIsNotNone(ref.locator)
+                    self.assertGreater(len(ref.locator), 0, "Locator should have at least one entry")
+                    for loc in ref.locator:
+                        self.assertIsInstance(loc, str)
+                break
+
+        self.logger.info(
+            f"External references {'found' if found_external_refs else 'not found'} "
+            f"in SPDX output (defensive handling verified)"
+        )
+
+    def test_version_extraction_patterns(self):
+        """Test that version extraction works for various package formats.
+
+        Verifies that Git source downloads carry extracted versions and that
+        the reported version strings are well-formed.
+        """
+        objset = self.check_recipe_spdx(
+            "opkg-utils",
+            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-opkg-utils.spdx.json",
+        )
+
+        # Collect all packages with versions
+        packages_with_versions = []
+        for pkg in objset.foreach_type(oe.spdx30.software_Package):
+            if pkg.software_packageVersion:
+                packages_with_versions.append((pkg.name, pkg.software_packageVersion))
+
+        self.assertGreater(
+            len(packages_with_versions), 0,
+            "Should find packages with extracted versions"
+        )
+
+        for name, version in packages_with_versions:
+            self.assertRegex(
+                version,
+                r"^[0-9a-f]{40}$",
+                f"Expected Git source version for {name} to be a full SHA-1",
+            )
+
+        self.logger.info(f"Found {len(packages_with_versions)} packages with versions")
+
+        # Log some examples for debugging
+        for name, version in packages_with_versions[:5]:
+            self.logger.info(f"  {name}: {version}")
+
+        # Verify that versions follow expected patterns
+        for name, version in packages_with_versions:
+            # Version should not be empty
+            self.assertIsNotNone(version)
+            self.assertNotEqual(version, "")
+
+            # Version should contain digits
+            self.assertRegex(
+                version,
+                r'\d',
+                f"Version '{version}' for package '{name}' should contain digits"
+            )
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v16 1/5] spdx30: Add configurable file exclusion pattern support
  2026-03-24 17:14   ` [PATCH v16 1/5] spdx30: Add configurable file exclusion pattern support Stefano Tondo
@ 2026-03-26 20:11     ` Joshua Watt
  0 siblings, 0 replies; 32+ messages in thread
From: Joshua Watt @ 2026-03-26 20:11 UTC (permalink / raw)
  To: Stefano Tondo
  Cc: openembedded-core, richard.purdie, ross.burton, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

On Tue, Mar 24, 2026 at 11:15 AM Stefano Tondo <stondo@gmail.com> wrote:
>
> Add SPDX_FILE_EXCLUDE_PATTERNS variable that allows filtering files from
> SPDX output by regex matching. The variable accepts a space-separated
> list of Python regular expressions; files whose paths match any pattern
> (via re.search) are excluded.
>
> When empty (the default), no filtering is applied and all files are
> included, preserving existing behavior.
>
> This enables users to reduce SBOM size by excluding files that are not
> relevant for compliance (e.g., test files, object files, patches).
>
> Excluded files are tracked in a set returned from add_package_files()
> and passed to get_package_sources_from_debug(), which uses the set for
> precise cross-checking rather than re-evaluating patterns.

LGTM, Thanks

Reviewed-by: Joshua Watt <JPEWhacker@gmail.com>

>
> Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
> ---
>  meta/classes/spdx-common.bbclass |  7 +++
>  meta/lib/oe/spdx30_tasks.py      | 80 +++++++++++++++++++++-----------
>  2 files changed, 60 insertions(+), 27 deletions(-)
>
> diff --git a/meta/classes/spdx-common.bbclass b/meta/classes/spdx-common.bbclass
> index 83f05579b6..40701730a6 100644
> --- a/meta/classes/spdx-common.bbclass
> +++ b/meta/classes/spdx-common.bbclass
> @@ -82,6 +82,13 @@ SPDX_MULTILIB_SSTATE_ARCHS[doc] = "The list of sstate architectures to consider
>      when collecting SPDX dependencies. This includes multilib architectures when \
>      multilib is enabled. Defaults to SSTATE_ARCHS."
>
> +SPDX_FILE_EXCLUDE_PATTERNS ??= ""
> +SPDX_FILE_EXCLUDE_PATTERNS[doc] = "Space-separated list of Python regular \
> +    expressions to exclude files from SPDX output. Files whose paths match \
> +    any pattern (via re.search) will be filtered out. Defaults to empty \
> +    (no filtering). Example: \
> +    SPDX_FILE_EXCLUDE_PATTERNS = '\\.patch$ \\.diff$ /test/ \\.pyc$ \\.o$'"
> +
>  python () {
>      from oe.cve_check import extend_cve_status
>      extend_cve_status(d)
> diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
> index 353d783fa2..68ed821a8c 100644
> --- a/meta/lib/oe/spdx30_tasks.py
> +++ b/meta/lib/oe/spdx30_tasks.py
> @@ -13,6 +13,7 @@ import oe.spdx30
>  import oe.spdx_common
>  import oe.sdk
>  import os
> +import re
>
>  from contextlib import contextmanager
>  from datetime import datetime, timezone
> @@ -157,17 +158,27 @@ def add_package_files(
>      file_counter = 1
>      if not os.path.exists(topdir):
>          bb.note(f"Skip {topdir}")
> -        return spdx_files
> +        return spdx_files, set()
>
>      check_compiled_sources = d.getVar("SPDX_INCLUDE_COMPILED_SOURCES") == "1"
>      if check_compiled_sources:
>          compiled_sources, types = oe.spdx_common.get_compiled_sources(d)
>          bb.debug(1, f"Total compiled files: {len(compiled_sources)}")
>
> +    exclude_patterns = [
> +        re.compile(pattern)
> +        for pattern in (d.getVar("SPDX_FILE_EXCLUDE_PATTERNS") or "").split()
> +    ]
> +    excluded_files = set()
> +
>      for subdir, dirs, files in os.walk(topdir, onerror=walk_error):
> -        dirs[:] = [d for d in dirs if d not in ignore_dirs]
> +        dirs[:] = [directory for directory in dirs if directory not in ignore_dirs]
>          if subdir == str(topdir):
> -            dirs[:] = [d for d in dirs if d not in ignore_top_level_dirs]
> +            dirs[:] = [
> +                directory
> +                for directory in dirs
> +                if directory not in ignore_top_level_dirs
> +            ]
>
>          dirs.sort()
>          files.sort()
> @@ -177,14 +188,19 @@ def add_package_files(
>                  continue
>
>              filename = str(filepath.relative_to(topdir))
> +
> +            if exclude_patterns and any(
> +                pattern.search(filename) for pattern in exclude_patterns
> +            ):
> +                excluded_files.add(filename)
> +                continue
> +
>              file_purposes = get_purposes(filepath)
>
> -            # Check if file is compiled
> -            if check_compiled_sources:
> -                if not oe.spdx_common.is_compiled_source(
> -                    filename, compiled_sources, types
> -                ):
> -                    continue
> +            if check_compiled_sources and not oe.spdx_common.is_compiled_source(
> +                filename, compiled_sources, types
> +            ):
> +                continue
>
>              spdx_file = objset.new_file(
>                  get_spdxid(file_counter),
> @@ -218,12 +234,15 @@ def add_package_files(
>
>      bb.debug(1, "Added %d files to %s" % (len(spdx_files), objset.doc._id))
>
> -    return spdx_files
> +    return spdx_files, excluded_files
>
>
>  def get_package_sources_from_debug(
> -    d, package, package_files, sources, source_hash_cache
> +    d, package, package_files, sources, source_hash_cache, excluded_files=None
>  ):
> +    if excluded_files is None:
> +        excluded_files = set()
> +
>      def file_path_match(file_path, pkg_file):
>          if file_path.lstrip("/") == pkg_file.name.lstrip("/"):
>              return True
> @@ -256,6 +275,12 @@ def get_package_sources_from_debug(
>              continue
>
>          if not any(file_path_match(file_path, pkg_file) for pkg_file in package_files):
> +            if file_path.lstrip("/") in excluded_files:
> +                bb.debug(
> +                    1,
> +                    f"Skipping debug source lookup for excluded file {file_path} in {package}",
> +                )
> +                continue
>              bb.fatal(
>                  "No package file found for %s in %s; SPDX found: %s"
>                  % (str(file_path), package, " ".join(p.name for p in package_files))
> @@ -737,7 +762,7 @@ def create_spdx(d):
>          bb.debug(1, "Adding source files to SPDX")
>          oe.spdx_common.get_patched_src(d)
>
> -        files = add_package_files(
> +        files, _ = add_package_files(
>              d,
>              build_objset,
>              spdx_workdir,
> @@ -909,7 +934,7 @@ def create_spdx(d):
>                  )
>
>              bb.debug(1, "Adding package files to SPDX for package %s" % pkg_name)
> -            package_files = add_package_files(
> +            package_files, excluded_files = add_package_files(
>                  d,
>                  pkg_objset,
>                  pkgdest / package,
> @@ -932,7 +957,8 @@ def create_spdx(d):
>
>              if include_sources:
>                  debug_sources = get_package_sources_from_debug(
> -                    d, package, package_files, dep_sources, source_hash_cache
> +                    d, package, package_files, dep_sources, source_hash_cache,
> +                    excluded_files=excluded_files,
>                  )
>                  debug_source_ids |= set(
>                      oe.sbom30.get_element_link_id(d) for d in debug_sources
> @@ -944,7 +970,7 @@ def create_spdx(d):
>
>      if include_sources:
>          bb.debug(1, "Adding sysroot files to SPDX")
> -        sysroot_files = add_package_files(
> +        sysroot_files, _ = add_package_files(
>              d,
>              build_objset,
>              d.expand("${COMPONENTS_DIR}/${PACKAGE_ARCH}/${PN}"),
> @@ -1326,18 +1352,18 @@ def create_image_spdx(d):
>              image_filename = image["filename"]
>              image_path = image_deploy_dir / image_filename
>              if os.path.isdir(image_path):
> -                a = add_package_files(
> -                    d,
> -                    objset,
> -                    image_path,
> -                    lambda file_counter: objset.new_spdxid(
> -                        "imagefile", str(file_counter)
> -                    ),
> -                    lambda filepath: [],
> -                    license_data=None,
> -                    ignore_dirs=[],
> -                    ignore_top_level_dirs=[],
> -                    archive=None,
> +                a, _ = add_package_files(
> +                        d,
> +                        objset,
> +                        image_path,
> +                        lambda file_counter: objset.new_spdxid(
> +                            "imagefile", str(file_counter)
> +                        ),
> +                        lambda filepath: [],
> +                        license_data=None,
> +                        ignore_dirs=[],
> +                        ignore_top_level_dirs=[],
> +                        archive=None,
>                  )
>                  artifacts.extend(a)
>              else:
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v16 2/5] spdx30: Add supplier support for image and SDK SBOMs
  2026-03-24 17:14   ` [PATCH v16 2/5] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
@ 2026-03-26 20:12     ` Joshua Watt
  0 siblings, 0 replies; 32+ messages in thread
From: Joshua Watt @ 2026-03-26 20:12 UTC (permalink / raw)
  To: Stefano Tondo
  Cc: openembedded-core, richard.purdie, ross.burton, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

On Tue, Mar 24, 2026 at 11:15 AM Stefano Tondo <stondo@gmail.com> wrote:
>
> Add SPDX_IMAGE_SUPPLIER and SPDX_SDK_SUPPLIER variables that allow
> setting a supplier agent on image and SDK SBOM root elements using
> the suppliedBy property.
>
> These follow the existing SPDX_PACKAGE_SUPPLIER pattern and use the
> standard agent variable system to define supplier information.
>

LGTM, Thanks

Reviewed-by: Joshua Watt <JPEWhacker@gmail.com>

> Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
> ---
>  meta/classes/create-spdx-3.0.bbclass | 10 ++++++++++
>  meta/lib/oe/spdx30_tasks.py          | 20 ++++++++++++++++++++
>  2 files changed, 30 insertions(+)
>
> diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
> index 7515f460c3..9a6606dce6 100644
> --- a/meta/classes/create-spdx-3.0.bbclass
> +++ b/meta/classes/create-spdx-3.0.bbclass
> @@ -124,6 +124,16 @@ SPDX_ON_BEHALF_OF[doc] = "The base variable name to describe the Agent on who's
>  SPDX_PACKAGE_SUPPLIER[doc] = "The base variable name to describe the Agent who \
>      is supplying artifacts produced by the build"
>
> +SPDX_IMAGE_SUPPLIER[doc] = "The base variable name to describe the Agent who \
> +    is supplying the image SBOM. The supplier will be set on all root elements \
> +    of the image SBOM using the suppliedBy property. If not set, no supplier \
> +    information will be added to the image SBOM."
> +
> +SPDX_SDK_SUPPLIER[doc] = "The base variable name to describe the Agent who \
> +    is supplying the SDK SBOM. The supplier will be set on all root elements \
> +    of the SDK SBOM using the suppliedBy property. If not set, no supplier \
> +    information will be added to the SDK SBOM."
> +
>  SPDX_PACKAGE_VERSION ??= "${PV}"
>  SPDX_PACKAGE_VERSION[doc] = "The version of a package, software_packageVersion \
>      in software_Package"
> diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
> index 68ed821a8c..51e10befba 100644
> --- a/meta/lib/oe/spdx30_tasks.py
> +++ b/meta/lib/oe/spdx30_tasks.py
> @@ -1449,6 +1449,16 @@ def create_image_sbom_spdx(d):
>
>      objset, sbom = oe.sbom30.create_sbom(d, image_name, root_elements)
>
> +    # Set supplier on root elements if SPDX_IMAGE_SUPPLIER is defined
> +    supplier = objset.new_agent("SPDX_IMAGE_SUPPLIER", add=False)
> +    if supplier is not None:
> +        supplier_id = supplier if isinstance(supplier, str) else supplier._id
> +        if not isinstance(supplier, str):
> +            objset.add(supplier)
> +        for elem in sbom.rootElement:
> +            if hasattr(elem, "suppliedBy"):
> +                elem.suppliedBy = supplier_id
> +
>      oe.sbom30.write_jsonld_doc(d, objset, spdx_path)
>
>      def make_image_link(target_path, suffix):
> @@ -1560,6 +1570,16 @@ def create_sdk_sbom(d, sdk_deploydir, spdx_work_dir, toolchain_outputname):
>          d, toolchain_outputname, sorted(list(files)), [rootfs_objset]
>      )
>
> +    # Set supplier on root elements if SPDX_SDK_SUPPLIER is defined
> +    supplier = objset.new_agent("SPDX_SDK_SUPPLIER", add=False)
> +    if supplier is not None:
> +        supplier_id = supplier if isinstance(supplier, str) else supplier._id
> +        if not isinstance(supplier, str):
> +            objset.add(supplier)
> +        for elem in sbom.rootElement:
> +            if hasattr(elem, "suppliedBy"):
> +                elem.suppliedBy = supplier_id
> +
>      oe.sbom30.write_jsonld_doc(
>          d, objset, sdk_deploydir / (toolchain_outputname + ".spdx.json")
>      )
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v16 3/5] spdx30: Add ecosystem PURLs for recipe classes
  2026-03-24 17:15   ` [PATCH v16 3/5] spdx30: Add ecosystem PURLs for recipe classes Stefano Tondo
@ 2026-03-26 20:13     ` Joshua Watt
  0 siblings, 0 replies; 32+ messages in thread
From: Joshua Watt @ 2026-03-26 20:13 UTC (permalink / raw)
  To: Stefano Tondo
  Cc: openembedded-core, richard.purdie, ross.burton, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

On Tue, Mar 24, 2026 at 11:15 AM Stefano Tondo <stondo@gmail.com> wrote:
>
> Add SPDX_PACKAGE_URLS to recipe classes to generate ecosystem-specific
> Package URLs for SPDX 3.0 SBOMs. This enables proper package
> identification across different packaging ecosystems.
>
> Classes updated:
> - cargo_common.bbclass: pkg:cargo PURLs for Rust crates
> - cpan.bbclass: pkg:cpan PURLs for Perl modules (with name normalization)
> - go-mod.bbclass: pkg:golang PURLs for Go modules
> - npm.bbclass: pkg:npm PURLs for Node.js packages (with name normalization)
> - pypi.bbclass: pkg:pypi PURLs for Python packages (with name normalization)
>
> The SPDX_PACKAGE_URLS variable is a space-separated list which
> create-spdx-3.0 already reads via set_purls() to populate
> software_packageUrl and externalIdentifier on recipe packages.
>

LGTM, thanks

Reviewed-by: Joshua Watt <JPEWhacker@gmail.com>

> Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
> ---
>  meta/classes-recipe/cargo_common.bbclass |  3 +++
>  meta/classes-recipe/cpan.bbclass         | 11 +++++++++++
>  meta/classes-recipe/go-mod.bbclass       |  3 +++
>  meta/classes-recipe/npm.bbclass          |  7 +++++++
>  meta/classes-recipe/pypi.bbclass         |  3 +++
>  5 files changed, 27 insertions(+)
>
> diff --git a/meta/classes-recipe/cargo_common.bbclass b/meta/classes-recipe/cargo_common.bbclass
> index bc44ad7918..0d3edfe4a7 100644
> --- a/meta/classes-recipe/cargo_common.bbclass
> +++ b/meta/classes-recipe/cargo_common.bbclass
> @@ -240,3 +240,6 @@ EXPORT_FUNCTIONS do_configure
>  # https://github.com/rust-lang/libc/issues/3223
>  # https://github.com/rust-lang/libc/pull/3175
>  INSANE_SKIP:append = " 32bit-time"
> +
> +# Generate ecosystem-specific Package URL for SPDX
> +SPDX_PACKAGE_URLS =+ "pkg:cargo/${BPN}@${PV} "
> diff --git a/meta/classes-recipe/cpan.bbclass b/meta/classes-recipe/cpan.bbclass
> index bb76a5b326..dbf44da9d2 100644
> --- a/meta/classes-recipe/cpan.bbclass
> +++ b/meta/classes-recipe/cpan.bbclass
> @@ -68,4 +68,15 @@ cpan_do_install () {
>         done
>  }
>
> +# Generate ecosystem-specific Package URL for SPDX
> +def cpan_spdx_name(d):
> +    bpn = d.getVar('BPN')
> +    if bpn.startswith('perl-'):
> +        return bpn[5:]
> +    elif bpn.startswith('libperl-'):
> +        return bpn[8:]
> +    return bpn
> +
> +SPDX_PACKAGE_URLS =+ "pkg:cpan/${@cpan_spdx_name(d)}@${PV} "
> +
>  EXPORT_FUNCTIONS do_configure do_compile do_install
> diff --git a/meta/classes-recipe/go-mod.bbclass b/meta/classes-recipe/go-mod.bbclass
> index a15dda8f0e..0f5835f26e 100644
> --- a/meta/classes-recipe/go-mod.bbclass
> +++ b/meta/classes-recipe/go-mod.bbclass
> @@ -32,3 +32,6 @@ do_compile[dirs] += "${B}/src/${GO_WORKDIR}"
>  # Make go install unpack the module zip files in the module cache directory
>  # before the license directory is polulated with license files.
>  addtask do_compile before do_populate_lic
> +
> +# Generate ecosystem-specific Package URL for SPDX
> +SPDX_PACKAGE_URLS =+ "pkg:golang/${GO_IMPORT}@${PV} "
> diff --git a/meta/classes-recipe/npm.bbclass b/meta/classes-recipe/npm.bbclass
> index 344e8b4bec..7bb791d543 100644
> --- a/meta/classes-recipe/npm.bbclass
> +++ b/meta/classes-recipe/npm.bbclass
> @@ -354,4 +354,11 @@ FILES:${PN} += " \
>      ${nonarch_libdir} \
>  "
>
> +# Generate ecosystem-specific Package URL for SPDX
> +def npm_spdx_name(d):
> +    bpn = d.getVar('BPN')
> +    return bpn[5:] if bpn.startswith('node-') else bpn
> +
> +SPDX_PACKAGE_URLS =+ "pkg:npm/${@npm_spdx_name(d)}@${PV} "
> +
>  EXPORT_FUNCTIONS do_configure do_compile do_install
> diff --git a/meta/classes-recipe/pypi.bbclass b/meta/classes-recipe/pypi.bbclass
> index 9d46c035f6..bd21557c60 100644
> --- a/meta/classes-recipe/pypi.bbclass
> +++ b/meta/classes-recipe/pypi.bbclass
> @@ -54,3 +54,6 @@ UPSTREAM_CHECK_URI ?= "https://pypi.org/simple/${@pypi_normalize(d)}/"
>  UPSTREAM_CHECK_REGEX ?= "${UPSTREAM_CHECK_PYPI_PACKAGE}-(?P<pver>(\d+[\.\-_]*)+).(tar\.gz|tgz|zip|tar\.bz2)"
>
>  CVE_PRODUCT ?= "python:${PYPI_PACKAGE}"
> +
> +# Generate ecosystem-specific Package URL for SPDX
> +SPDX_PACKAGE_URLS =+ "pkg:pypi/${@pypi_normalize(d)}@${PV} "
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v16 4/5] spdx30: Add Git version and PURL to source downloads
  2026-03-24 17:12   ` [PATCH v16 4/5] spdx30: Add Git version and PURL to source downloads Stefano Tondo
@ 2026-03-26 20:14     ` Joshua Watt
  0 siblings, 0 replies; 32+ messages in thread
From: Joshua Watt @ 2026-03-26 20:14 UTC (permalink / raw)
  To: Stefano Tondo
  Cc: openembedded-core, richard.purdie, ross.burton, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

On Tue, Mar 24, 2026 at 11:14 AM Stefano Tondo <stondo@gmail.com> wrote:
>
> Enrich Git source download packages in the SPDX 3.0 output with:
> - software_packageVersion set to the full SHA-1 commit hash
> - software_packageUrl set to a PURL for known Git hosting services
> - VCS external reference pointing to the repository URL
>
> The PURL generation recognizes github.com by default and supports
> additional hosting services via the SPDX_GIT_PURL_MAPPINGS variable
> (format: 'domain:purl_type', e.g. 'gitlab.example.com:pkg:gitlab').
>
> Only Git source downloads are enriched. Non-Git downloads are left
> unchanged since their ecosystem PURLs are already set on the recipe
> package by SPDX_PACKAGE_URLS from the previous patch.
>

LGTM, thanks

Reviewed-by: Joshua Watt <JPEWhacker@gmail.com>

> Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
> ---
>  meta/classes/create-spdx-3.0.bbclass |   7 ++
>  meta/lib/oe/spdx30_tasks.py          | 102 +++++++++++++++++++++++++++
>  2 files changed, 109 insertions(+)
>
> diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
> index 9a6606dce6..432adb14cd 100644
> --- a/meta/classes/create-spdx-3.0.bbclass
> +++ b/meta/classes/create-spdx-3.0.bbclass
> @@ -156,6 +156,13 @@ SPDX_RECIPE_SBOM_NAME ?= "${PN}-recipe-sbom"
>  SPDX_RECIPE_SBOM_NAME[doc] = "The name of output recipe SBoM when using \
>      create_recipe_sbom"
>
> +SPDX_GIT_PURL_MAPPINGS ??= ""
> +SPDX_GIT_PURL_MAPPINGS[doc] = "A space separated list of domain:purl_type \
> +    mappings to configure PURL generation for Git source downloads. \
> +    For example, 'gitlab.example.com:pkg:gitlab' maps repositories hosted \
> +    on gitlab.example.com to the pkg:gitlab PURL type. \
> +    github.com is always mapped to pkg:github by default."
> +
>  IMAGE_CLASSES:append = " create-spdx-image-3.0"
>  SDK_CLASSES += "create-spdx-sdk-3.0"
>
> diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
> index 51e10befba..cd9672c18e 100644
> --- a/meta/lib/oe/spdx30_tasks.py
> +++ b/meta/lib/oe/spdx30_tasks.py
> @@ -14,6 +14,7 @@ import oe.spdx_common
>  import oe.sdk
>  import os
>  import re
> +import urllib.parse
>
>  from contextlib import contextmanager
>  from datetime import datetime, timezone
> @@ -384,6 +385,105 @@ def collect_dep_sources(dep_objsets, dest):
>              index_sources_by_hash(e.to, dest)
>
>
> +
> +def _generate_git_purl(d, download_location, srcrev):
> +    """Generate a Package URL for a Git source from its download location.
> +
> +    Parses the Git URL to identify the hosting service and generates the
> +    appropriate PURL type. Supports github.com by default and custom
> +    mappings via SPDX_GIT_PURL_MAPPINGS.
> +
> +    Returns the PURL string or None if no mapping matches.
> +    """
> +    if not download_location or not download_location.startswith('git+'):
> +        return None
> +
> +    git_url = download_location[4:]  # Remove 'git+' prefix
> +
> +    # Default handler: github.com
> +    git_purl_handlers = {
> +        'github.com': 'pkg:github',
> +    }
> +
> +    # Custom PURL mappings from SPDX_GIT_PURL_MAPPINGS
> +    # Format: "domain1:purl_type1 domain2:purl_type2"
> +    custom_mappings = d.getVar('SPDX_GIT_PURL_MAPPINGS')
> +    if custom_mappings:
> +        for mapping in custom_mappings.split():
> +            parts = mapping.split(':', 1)
> +            if len(parts) == 2:
> +                git_purl_handlers[parts[0]] = parts[1]
> +                bb.debug(2, f"Added custom Git PURL mapping: {parts[0]} -> {parts[1]}")
> +            else:
> +                bb.warn(f"Invalid SPDX_GIT_PURL_MAPPINGS entry: {mapping} (expected format: domain:purl_type)")
> +
> +    try:
> +        parsed = urllib.parse.urlparse(git_url)
> +    except Exception:
> +        return None
> +
> +    hostname = parsed.hostname
> +    if not hostname:
> +        return None
> +
> +    for domain, purl_type in git_purl_handlers.items():
> +        if hostname == domain:
> +            path = parsed.path.strip('/')
> +            path_parts = path.split('/')
> +            if len(path_parts) >= 2:
> +                owner = path_parts[0]
> +                repo = path_parts[1].replace('.git', '')
> +                return f"{purl_type}/{owner}/{repo}@{srcrev}"
> +            break
> +
> +    return None
> +
> +
> +def _enrich_source_package(d, dl, fd, file_name, primary_purpose):
> +    """Enrich a Git source download package with version, PURL, and external refs.
> +
> +    For Git sources, extracts the full SHA-1 from SRCREV as the version,
> +    generates PURLs for known hosting services, and adds VCS external
> +    references.
> +    """
> +    version = None
> +    purl = None
> +
> +    if fd.type == "git":
> +        # Use full SHA-1 from fd.revision
> +        srcrev = getattr(fd, 'revision', None)
> +        if srcrev and srcrev not in {'${AUTOREV}', 'AUTOINC', 'INVALID'}:
> +            version = srcrev
> +
> +        # Generate PURL for Git hosting services
> +        download_location = getattr(dl, 'software_downloadLocation', None)
> +        if version and download_location:
> +            purl = _generate_git_purl(d, download_location, version)
> +
> +    if version:
> +        dl.software_packageVersion = version
> +
> +    if purl:
> +        dl.software_packageUrl = purl
> +
> +    # Add VCS external reference for Git repositories
> +    download_location = getattr(dl, 'software_downloadLocation', None)
> +    if download_location and isinstance(download_location, str):
> +        if download_location.startswith('git+'):
> +            git_url = download_location[4:]
> +            if '@' in git_url:
> +                git_url = git_url.split('@')[0]
> +
> +            dl.externalRef = dl.externalRef or []
> +            dl.externalRef.append(
> +                oe.spdx30.ExternalRef(
> +                    externalRefType=oe.spdx30.ExternalRefType.vcs,
> +                    locator=[git_url],
> +                )
> +            )
> +
> +
> +
>  def add_download_files(d, objset):
>      inputs = set()
>
> @@ -447,6 +547,8 @@ def add_download_files(d, objset):
>                  )
>              )
>
> +            _enrich_source_package(d, dl, fd, file_name, primary_purpose)
> +
>              if fd.method.supports_checksum(fd):
>                  # TODO Need something better than hard coding this
>                  for checksum_id in ["sha256", "sha1"]:
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [OE-core] [PATCH v16 5/5] oeqa/selftest: Add tests for source download enrichment
  2026-03-24 17:15   ` [PATCH v16 5/5] oeqa/selftest: Add tests for source download enrichment Stefano Tondo
@ 2026-03-26 20:15     ` Joshua Watt
  0 siblings, 0 replies; 32+ messages in thread
From: Joshua Watt @ 2026-03-26 20:15 UTC (permalink / raw)
  To: stondo
  Cc: openembedded-core, richard.purdie, ross.burton, stefano.tondo.ext,
	peter.marko, adrian.freihofer, mathieu.dubois-briand

On Tue, Mar 24, 2026 at 11:15 AM Stefano Tondo via
lists.openembedded.org <stondo=gmail.com@lists.openembedded.org>
wrote:
>
> Add two new test methods to SPDX30Check:
>
> test_download_location_defensive_handling:
>   Builds m4 and verifies that SPDX generation succeeds and any
>   external references present are properly structured with valid
>   types and locator strings.
>
> test_version_extraction_patterns:
>   Builds opkg-utils (a Git-based recipe) and verifies that source
>   download packages carry the full SHA-1 commit hash as their
>   software_packageVersion.
>

LGTM, thanks

Reviewed-by: Joshua Watt <JPEWhacker@gmail.com>

> Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
> ---
>  meta/lib/oeqa/selftest/cases/spdx.py | 76 ++++++++++++++++++++++++++++
>  1 file changed, 76 insertions(+)
>
> diff --git a/meta/lib/oeqa/selftest/cases/spdx.py b/meta/lib/oeqa/selftest/cases/spdx.py
> index af1144c1e5..9347e0bf7b 100644
> --- a/meta/lib/oeqa/selftest/cases/spdx.py
> +++ b/meta/lib/oeqa/selftest/cases/spdx.py
> @@ -428,3 +428,79 @@ class SPDX30Check(SPDX3CheckBase, OESelftestTestCase):
>                  value, ["enabled", "disabled"],
>                  f"Unexpected PACKAGECONFIG value '{value}' for {key}"
>              )
> +
> +    def test_download_location_defensive_handling(self):
> +        """Test that download_location handling is defensive.
> +
> +        Verifies SPDX generation succeeds and external references are
> +        properly structured when download_location retrieval works.
> +        """
> +        objset = self.check_recipe_spdx(
> +            "m4",
> +            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-m4.spdx.json",
> +        )
> +
> +        found_external_refs = False
> +        for pkg in objset.foreach_type(oe.spdx30.software_Package):
> +            if pkg.externalRef:
> +                found_external_refs = True
> +                for ref in pkg.externalRef:
> +                    self.assertIsNotNone(ref.externalRefType)
> +                    self.assertIsNotNone(ref.locator)
> +                    self.assertGreater(len(ref.locator), 0, "Locator should have at least one entry")
> +                    for loc in ref.locator:
> +                        self.assertIsInstance(loc, str)
> +                break
> +
> +        self.logger.info(
> +            f"External references {'found' if found_external_refs else 'not found'} "
> +            f"in SPDX output (defensive handling verified)"
> +        )
> +
> +    def test_version_extraction_patterns(self):
> +        """Test that version extraction works for various package formats.
> +
> +        Verifies that Git source downloads carry extracted versions and that
> +        the reported version strings are well-formed.
> +        """
> +        objset = self.check_recipe_spdx(
> +            "opkg-utils",
> +            "{DEPLOY_DIR_SPDX}/{SSTATE_PKGARCH}/builds/build-opkg-utils.spdx.json",
> +        )
> +
> +        # Collect all packages with versions
> +        packages_with_versions = []
> +        for pkg in objset.foreach_type(oe.spdx30.software_Package):
> +            if pkg.software_packageVersion:
> +                packages_with_versions.append((pkg.name, pkg.software_packageVersion))
> +
> +        self.assertGreater(
> +            len(packages_with_versions), 0,
> +            "Should find packages with extracted versions"
> +        )
> +
> +        for name, version in packages_with_versions:
> +            self.assertRegex(
> +                version,
> +                r"^[0-9a-f]{40}$",
> +                f"Expected Git source version for {name} to be a full SHA-1",
> +            )
> +
> +        self.logger.info(f"Found {len(packages_with_versions)} packages with versions")
> +
> +        # Log some examples for debugging
> +        for name, version in packages_with_versions[:5]:
> +            self.logger.info(f"  {name}: {version}")
> +
> +        # Verify that versions follow expected patterns
> +        for name, version in packages_with_versions:
> +            # Version should not be empty
> +            self.assertIsNotNone(version)
> +            self.assertNotEqual(version, "")
> +
> +            # Version should contain digits
> +            self.assertRegex(
> +                version,
> +                r'\d',
> +                f"Version '{version}' for package '{name}' should contain digits"
> +            )
> --
> 2.53.0
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#233819): https://lists.openembedded.org/g/openembedded-core/message/233819
> Mute This Topic: https://lists.openembedded.org/mt/118487364/3616693
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [JPEWhacker@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2026-03-26 20:16 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23 21:07 [OE-core][PATCH v13 0/4] SPDX 3.0 SBOM enrichment and compliance improvements Stefano Tondo
2026-03-23 21:07 ` [PATCH v13 1/4] spdx30: Add configurable file exclusion pattern support Stefano Tondo
2026-03-23 21:07 ` [PATCH v13 2/4] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
2026-03-23 21:07 ` [PATCH v13 3/4] spdx30: Enrich source downloads with version and PURL Stefano Tondo
2026-03-23 21:07 ` [PATCH v13 4/4] oeqa/selftest: Add tests for source download enrichment Stefano Tondo
2026-03-24 10:26   ` Richard Purdie
2026-03-24 14:48   ` Joshua Watt
2026-03-24 13:29 ` [OE-core][PATCH v14 0/4] SPDX 3.0 SBOM enrichment and compliance improvements stondo
2026-03-24 13:29   ` [OE-core][PATCH v14 1/4] spdx30: Add configurable file exclusion pattern support stondo
2026-03-24 14:22     ` Joshua Watt
2026-03-24 13:29   ` [OE-core][PATCH v14 2/4] spdx30: Add supplier support for image and SDK SBOMs stondo
2026-03-24 14:24     ` Joshua Watt
2026-03-24 13:29   ` [OE-core][PATCH v14 3/4] spdx30: Enrich source downloads with version and PURL stondo
2026-03-24 14:46     ` Joshua Watt
2026-03-24 13:29   ` [OE-core][PATCH v14 4/4] oeqa/selftest: Add tests for source download enrichment stondo
2026-03-24 17:12   ` [PATCH v16 0/5] spdx30: PURL and " Stefano Tondo
2026-03-24 17:12   ` [PATCH v16 1/5] spdx30: Add configurable file exclusion pattern support Stefano Tondo
2026-03-24 17:12   ` [PATCH v16 2/5] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
2026-03-24 17:12   ` [PATCH v16 3/5] spdx30: Add ecosystem PURLs for recipe classes Stefano Tondo
2026-03-24 17:12   ` [PATCH v16 4/5] spdx30: Add Git version and PURL to source downloads Stefano Tondo
2026-03-26 20:14     ` Joshua Watt
2026-03-24 17:12   ` [PATCH v16 5/5] oeqa/selftest: Add tests for source download enrichment Stefano Tondo
2026-03-24 17:14   ` [PATCH v16 0/5] spdx30: PURL and " Stefano Tondo
2026-03-24 17:14   ` [PATCH v16 1/5] spdx30: Add configurable file exclusion pattern support Stefano Tondo
2026-03-26 20:11     ` Joshua Watt
2026-03-24 17:14   ` [PATCH v16 2/5] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
2026-03-26 20:12     ` Joshua Watt
2026-03-24 17:15   ` [PATCH v16 3/5] spdx30: Add ecosystem PURLs for recipe classes Stefano Tondo
2026-03-26 20:13     ` Joshua Watt
2026-03-24 17:15   ` [PATCH v16 4/5] spdx30: Add Git version and PURL to source downloads Stefano Tondo
2026-03-24 17:15   ` [PATCH v16 5/5] oeqa/selftest: Add tests for source download enrichment Stefano Tondo
2026-03-26 20:15     ` [OE-core] " Joshua Watt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox