From: Stefano Tondo <stondo@gmail.com>
To: openembedded-core@lists.openembedded.org
Cc: stefano.tondo.ext@siemens.com, adrian.freihofer@siemens.com,
Peter.Marko@siemens.com, jpewhacker@gmail.com,
Ross.Burton@arm.com, mathieu.dubois-briand@bootlin.com
Subject: [PATCH v3 06/11] spdx30: Enrich source downloads with external refs and PURLs
Date: Tue, 24 Feb 2026 17:29:41 +0100 [thread overview]
Message-ID: <20260224162946.4000445-7-stondo@gmail.com> (raw)
In-Reply-To: <20260224162946.4000445-1-stondo@gmail.com>
From: Stefano Tondo <stefano.tondo.ext@siemens.com>
Enrich source download packages in SPDX SBOMs with comprehensive
source tracking metadata:
External references:
- VCS references for Git repositories (ExternalRefType.vcs)
- Distribution references for HTTP/HTTPS/FTP archive downloads
- Homepage references from HOMEPAGE variable
Source PURL qualifiers:
- Add ?type=source qualifier for recipe source tarballs to
distinguish them from built runtime packages
- Only applied to pkg:yocto or pkg:generic PURLs (ecosystem-specific
PURLs like pkg:npm already have their own semantics)
Version extraction with priority chain:
- Priority 1: ;tag= parameter from SRC_URI (preferred, provides
meaningful versions like '1.2.3')
- Priority 2: fd.revision (resolved Git commit hash)
- Priority 3: SRCREV variable
- Priority 4: PV from recipe metadata
PURL generation:
- Generate pkg:github PURLs for GitHub-hosted repositories
- Extensible via SPDX_GIT_PURL_MAPPINGS for other hosting services
- Ecosystem-specific version and PURL integration for Rust crates,
Go modules, PyPI, NPM packages
Also add defensive error handling for download_location retrieval
and wire up extract_dependency_metadata() for non-Git sources.
Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
meta/lib/oe/spdx30_tasks.py | 187 +++++++++++++++++++++++++-----------
1 file changed, 129 insertions(+), 58 deletions(-)
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 4493b6e5e1..7fc831225a 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -20,7 +20,6 @@ from datetime import datetime, timezone
from pathlib import Path
-
def extract_dependency_metadata(d, file_name):
"""Extract ecosystem-specific PURL for dependency packages.
@@ -573,81 +572,154 @@ def add_download_files(d, objset):
dep_version = None
dep_purl = None
- # For Git repositories, extract version from SRCREV
+ # Get download location for external references
+ download_location = None
+ try:
+ download_location = oe.spdx_common.fetch_data_to_uri(fd, fd.name)
+ except Exception as e:
+ bb.debug(1, f"Could not get download location for {file_name}: {e}")
+
+ # For Git repositories, extract version from SRCREV or tag
if fd.type == "git":
srcrev = None
- # Try to get SRCREV for this specific source URL
+ # Prefer ;tag= parameter from SRC_URI
+ if hasattr(fd, 'parm') and fd.parm and 'tag' in fd.parm:
+ tag = fd.parm['tag']
+ if tag and tag not in ['${AUTOREV}', 'AUTOINC', 'INVALID']:
+ dep_version = tag[1:] if tag.startswith('v') else tag
+ version_source = "tag"
+ # Try fd.revision for resolved SRCREV
# Note: fd.revision (not fd.revisions) contains the resolved revision
- if hasattr(fd, 'revision') and fd.revision:
+ if not dep_version and hasattr(fd, 'revision') and fd.revision:
srcrev = fd.revision
- bb.debug(1, f"SPDX: Found fd.revision for {file_name}: {srcrev}")
-
- # Fallback to general SRCREV variable
- if not srcrev:
+ version_source = "fd.revision"
+ # Fallback to SRCREV variable
+ if not dep_version and not srcrev:
srcrev = d.getVar('SRCREV')
if srcrev:
- bb.debug(1, f"SPDX: Using SRCREV variable for {file_name}: {srcrev}")
-
- if srcrev and srcrev not in ['${AUTOREV}', 'AUTOINC', 'INVALID']:
- # Use first 12 characters of Git commit as version (standard Git short hash)
+ version_source = "SRCREV"
+ if not dep_version and srcrev and srcrev not in ['${AUTOREV}', 'AUTOINC', 'INVALID']:
dep_version = srcrev[:12] if len(srcrev) >= 12 else srcrev
- bb.debug(1, f"SPDX: Extracted Git version for {file_name}: {dep_version}")
-
- # Generate PURL for Git hosting services
- # Reference: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst
- download_location = oe.spdx_common.fetch_data_to_uri(fd, fd.name)
- if download_location and download_location.startswith('git+'):
- git_url = download_location[4:] # Remove 'git+' prefix
-
- # Build Git PURL handlers from default + custom mappings
- # Format: 'domain': ('purl_type', lambda to extract path)
- # Can be extended in meta-siemens or other layers via SPDX_GIT_PURL_MAPPINGS
- git_purl_handlers = {
- 'github.com': ('pkg:github', lambda parts: f"{parts[0]}/{parts[1].replace('.git', '')}" if len(parts) >= 2 else None),
- # Note: pkg:gitlab is NOT in official PURL spec, so we omit it by default
- # Other Git hosts can be added via SPDX_GIT_PURL_MAPPINGS
- }
-
- # Allow layers to extend PURL mappings via SPDX_GIT_PURL_MAPPINGS variable
- # Format: "domain1:purl_type1 domain2:purl_type2"
- # Example: SPDX_GIT_PURL_MAPPINGS = "gitlab.com:pkg:gitlab git.example.com:pkg:generic"
- custom_mappings = d.getVar('SPDX_GIT_PURL_MAPPINGS')
- if custom_mappings:
- for mapping in custom_mappings.split():
- try:
- domain, purl_type = mapping.split(':')
- # Use simple path handler for custom domains
- git_purl_handlers[domain] = (purl_type, lambda parts: f"{parts[0]}/{parts[1].replace('.git', '')}" if len(parts) >= 2 else None)
- bb.debug(2, f"SPDX: Added custom Git PURL mapping: {domain} -> {purl_type}")
- except ValueError:
- bb.warn(f"SPDX: Invalid SPDX_GIT_PURL_MAPPINGS entry: {mapping} (expected format: domain:purl_type)")
-
- for domain, (purl_type, path_handler) in git_purl_handlers.items():
- if f'://{domain}/' in git_url or f'//{domain}/' in git_url:
- # Extract path after domain
- path_start = git_url.find(f'{domain}/') + len(f'{domain}/')
- path = git_url[path_start:].split('/')
- purl_path = path_handler(path)
- if purl_path:
- dep_purl = f"{purl_type}/{purl_path}@{srcrev}"
- bb.debug(1, f"SPDX: Generated {purl_type} PURL: {dep_purl}")
- break
-
- # Fallback: use parent package version if no other version found
+ bb.debug(1, f"Extracted Git version for {file_name}: {dep_version} (from {version_source})")
+
+ # Generate PURL for Git hosting services
+ # Reference: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst
+ if dep_version and download_location and isinstance(download_location, str) and download_location.startswith('git+'):
+ git_url = download_location[4:] # Remove 'git+' prefix
+
+ # Default Git PURL handler (github.com)
+ git_purl_handlers = {
+ 'github.com': ('pkg:github', lambda parts: f"{parts[0]}/{parts[1].replace('.git', '')}" if len(parts) >= 2 else None),
+ # Note: pkg:gitlab is NOT in official PURL spec, so we omit it by default
+ }
+
+ # Custom PURL mappings from SPDX_GIT_PURL_MAPPINGS
+ # Format: "domain1:purl_type1 domain2:purl_type2"
+ # Example: SPDX_GIT_PURL_MAPPINGS = "gitlab.com:pkg:gitlab git.example.com:pkg:generic"
+ custom_mappings = d.getVar('SPDX_GIT_PURL_MAPPINGS')
+ if custom_mappings:
+ for mapping in custom_mappings.split():
+ try:
+ domain, purl_type = mapping.split(':')
+ git_purl_handlers[domain] = (purl_type, lambda parts: f"{parts[0]}/{parts[1].replace('.git', '')}" if len(parts) >= 2 else None)
+ bb.debug(2, f"Added custom Git PURL mapping: {domain} -> {purl_type}")
+ except ValueError:
+ bb.warn(f"Invalid SPDX_GIT_PURL_MAPPINGS entry: {mapping} (expected format: domain:purl_type)")
+
+ for domain, (purl_type, path_handler) in git_purl_handlers.items():
+ if f'://{domain}/' in git_url or f'//{domain}/' in git_url:
+ path_start = git_url.find(f'{domain}/') + len(f'{domain}/')
+ path = git_url[path_start:].split('/')
+ purl_path = path_handler(path)
+ if purl_path:
+ purl_version = dep_version if version_source == "tag" else (srcrev if srcrev else dep_version)
+ dep_purl = f"{purl_type}/{purl_path}@{purl_version}"
+ bb.debug(1, f"Generated {purl_type} PURL: {dep_purl}")
+ break
+
+ # Fallback to recipe PV
if not dep_version:
pv = d.getVar('PV')
if pv and pv not in ['git', 'AUTOINC', 'INVALID', '${PV}']:
dep_version = pv
- bb.debug(1, f"SPDX: Using parent PV for {file_name}: {dep_version}")
+ # Non-Git: try ecosystem-specific PURL
+ if fd.type != "git":
+ ecosystem_version, ecosystem_purl = extract_dependency_metadata(d, file_name)
+
+ if ecosystem_version and not dep_version:
+ dep_version = ecosystem_version
+ if ecosystem_purl and not dep_purl:
+ dep_purl = ecosystem_purl
+ bb.debug(1, f"Generated ecosystem PURL for {file_name}: {dep_purl}")
- # Set version and PURL if extracted
if dep_version:
dl.software_packageVersion = dep_version
if dep_purl:
dl.software_packageUrl = dep_purl
+ # Add ?type=source qualifier for source tarballs
+ if (primary_purpose == oe.spdx30.software_SoftwarePurpose.source and
+ fd.type != "git" and
+ file_name.endswith(('.tar.gz', '.tar.bz2', '.tar.xz', '.zip', '.tgz'))):
+
+ current_purl = dl.software_packageUrl
+ if current_purl:
+ purl_type = current_purl.split('/')[0] if '/' in current_purl else ''
+ if purl_type in ['pkg:yocto', 'pkg:generic']:
+ source_purl = f"{current_purl}?type=source"
+ dl.software_packageUrl = source_purl
+ else:
+ recipe_purl = oe.purl.get_base_purl(d)
+ if recipe_purl:
+ base_purl = recipe_purl
+ source_purl = f"{base_purl}?type=source"
+ dl.software_packageUrl = source_purl
+ # Add external references
+
+ # VCS reference for Git repositories
+ if fd.type == "git" and download_location and isinstance(download_location, str) and download_location.startswith('git+'):
+ git_url = download_location[4:] # Remove 'git+' prefix
+ # Clean up URL (remove commit hash if present)
+ if '@' in git_url:
+ git_url = git_url.split('@')[0]
+
+ dl.externalRef = dl.externalRef or []
+ dl.externalRef.append(
+ oe.spdx30.ExternalRef(
+ externalRefType=oe.spdx30.ExternalRefType.vcs,
+ locator=[git_url],
+ )
+ )
+
+ # Distribution reference for tarball/archive downloads
+ elif download_location and isinstance(download_location, str) and (
+ download_location.startswith('http://') or
+ download_location.startswith('https://') or
+ download_location.startswith('ftp://')):
+ dl.externalRef = dl.externalRef or []
+ dl.externalRef.append(
+ oe.spdx30.ExternalRef(
+ externalRefType=oe.spdx30.ExternalRefType.altDownloadLocation,
+ locator=[download_location],
+ )
+ )
+
+ # Homepage reference if available
+ homepage = d.getVar('HOMEPAGE')
+ if homepage:
+ homepage = homepage.strip()
+ dl.externalRef = dl.externalRef or []
+ # Only add if not already added as distribution reference
+ if not any(homepage in ref.locator for ref in dl.externalRef):
+ dl.externalRef.append(
+ oe.spdx30.ExternalRef(
+ externalRefType=oe.spdx30.ExternalRefType.altWebPage,
+ locator=[homepage],
+ )
+ )
+
if fd.method.supports_checksum(fd):
# TODO Need something better than hard coding this
for checksum_id in ["sha256", "sha1"]:
@@ -664,7 +736,6 @@ def add_download_files(d, objset):
)
)
- inputs.add(dl)
return inputs
--
2.53.0
next prev parent reply other threads:[~2026-02-24 16:30 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-24 16:29 [PATCH v3 00/11] spdx30: SBOM enrichment and documentation Stefano Tondo
2026-02-24 16:29 ` [PATCH v3 01/11] spdx30: Add configurable file filtering support Stefano Tondo
2026-02-24 16:29 ` [PATCH v3 02/11] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
2026-02-24 16:29 ` [PATCH v3 03/11] spdx30: Add ecosystem-specific PURL generation Stefano Tondo
2026-02-24 16:29 ` [PATCH v3 04/11] spdx30: Add version extraction from SRCREV for Git source components Stefano Tondo
2026-02-26 8:28 ` Mathieu Dubois-Briand
2026-02-24 16:29 ` [PATCH v3 05/11] spdx30: Add SPDX_GIT_PURL_MAPPINGS for Git hosting Stefano Tondo
2026-02-24 16:29 ` Stefano Tondo [this message]
2026-02-24 16:29 ` [PATCH v3 07/11] spdx30: Include recipe base PURL in package external identifiers Stefano Tondo
2026-02-24 16:29 ` [PATCH v3 08/11] oeqa/selftest: Add test for download_location defensive handling Stefano Tondo
2026-02-24 16:29 ` [PATCH v3 09/11] spdx.py: Add test for version extraction patterns Stefano Tondo
2026-02-24 16:29 ` [PATCH v3 10/11] cve_check: Escape special characters in CPE 2.3 formatted strings Stefano Tondo
2026-02-24 16:29 ` [PATCH v3 11/11] spdx-common: Add documentation for undocumented SPDX variables Stefano Tondo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260224162946.4000445-7-stondo@gmail.com \
--to=stondo@gmail.com \
--cc=Peter.Marko@siemens.com \
--cc=Ross.Burton@arm.com \
--cc=adrian.freihofer@siemens.com \
--cc=jpewhacker@gmail.com \
--cc=mathieu.dubois-briand@bootlin.com \
--cc=openembedded-core@lists.openembedded.org \
--cc=stefano.tondo.ext@siemens.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.