From: Stefano Tondo <stondo@gmail.com>
To: openembedded-core@lists.openembedded.org
Cc: stefano.tondo.ext@siemens.com, adrian.freihofer@siemens.com,
Peter.Marko@siemens.com, jpewhacker@gmail.com,
Ross.Burton@arm.com
Subject: [PATCH v2 04/18] spdx30: Add version extraction from SRCREV for Git source components
Date: Sat, 21 Feb 2026 06:09:52 +0100 [thread overview]
Message-ID: <20260221051006.335141-5-stondo@gmail.com> (raw)
In-Reply-To: <20260221051006.335141-1-stondo@gmail.com>
From: Stefano Tondo <stefano.tondo.ext@siemens.com>
Extract version information for Git-based source components in SPDX 3.0
SBOMs to improve SBOM completeness and enable better supply chain tracking.
Problem:
Git repositories fetched as SRC_URI entries currently appear in SBOMs
without version information (software_packageVersion is null). This makes
it difficult to track which specific revision of a dependency was used,
reducing SBOM usefulness for security and compliance tracking.
Solution:
- Extract SRCREV for Git sources and use it as packageVersion
- Use fd.revision attribute (the resolved Git commit)
- Fallback to SRCREV variable if fd.revision not available
- Use first 12 characters as version (standard Git short hash)
- Generate pkg:github PURLs for GitHub repositories (official PURL type)
- Add comprehensive debug logging for troubleshooting
Impact:
- Git source components now have version information
- GitHub repositories get proper PURLs (pkg:github/owner/repo@commit)
- Enables tracking specific commit dependencies in SBOMs
Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
meta/lib/oe/spdx30_tasks.py | 79 +++++++++++++++++++++++++++++++++++++
1 file changed, 79 insertions(+)
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 0ee39ffcd5..970921e986 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -569,6 +569,85 @@ def add_download_files(d, objset):
)
)
+ # Extract version and PURL for source packages
+ dep_version = None
+ dep_purl = None
+
+ # For Git repositories, extract version from SRCREV
+ if fd.type == "git":
+ srcrev = None
+
+ # Try to get SRCREV for this specific source URL
+ # Note: fd.revision (not fd.revisions) contains the resolved revision
+ if hasattr(fd, 'revision') and fd.revision:
+ srcrev = fd.revision
+ bb.debug(1, f"SPDX: Found fd.revision for {file_name}: {srcrev}")
+
+ # Fallback to general SRCREV variable
+ if not srcrev:
+ srcrev = d.getVar('SRCREV')
+ if srcrev:
+ bb.debug(1, f"SPDX: Using SRCREV variable for {file_name}: {srcrev}")
+
+ if srcrev and srcrev not in ['${AUTOREV}', 'AUTOINC', 'INVALID']:
+ # Use first 12 characters of Git commit as version (standard Git short hash)
+ dep_version = srcrev[:12] if len(srcrev) >= 12 else srcrev
+ bb.debug(1, f"SPDX: Extracted Git version for {file_name}: {dep_version}")
+
+ # Generate PURL for Git hosting services
+ # Reference: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst
+ download_location = oe.spdx_common.fetch_data_to_uri(fd, fd.name)
+ if download_location and download_location.startswith('git+'):
+ git_url = download_location[4:] # Remove 'git+' prefix
+
+ # Build Git PURL handlers from default + custom mappings
+ # Format: 'domain': ('purl_type', lambda to extract path)
+ # Can be extended in meta-siemens or other layers via SPDX_GIT_PURL_MAPPINGS
+ git_purl_handlers = {
+ 'github.com': ('pkg:github', lambda parts: f"{parts[0]}/{parts[1].replace('.git', '')}" if len(parts) >= 2 else None),
+ # Note: pkg:gitlab is NOT in official PURL spec, so we omit it by default
+ # Other Git hosts can be added via SPDX_GIT_PURL_MAPPINGS
+ }
+
+ # Allow layers to extend PURL mappings via SPDX_GIT_PURL_MAPPINGS variable
+ # Format: "domain1:purl_type1 domain2:purl_type2"
+ # Example: SPDX_GIT_PURL_MAPPINGS = "gitlab.com:pkg:gitlab git.example.com:pkg:generic"
+ custom_mappings = d.getVar('SPDX_GIT_PURL_MAPPINGS')
+ if custom_mappings:
+ for mapping in custom_mappings.split():
+ try:
+ domain, purl_type = mapping.split(':')
+ # Use simple path handler for custom domains
+ git_purl_handlers[domain] = (purl_type, lambda parts: f"{parts[0]}/{parts[1].replace('.git', '')}" if len(parts) >= 2 else None)
+ bb.debug(2, f"SPDX: Added custom Git PURL mapping: {domain} -> {purl_type}")
+ except ValueError:
+ bb.warn(f"SPDX: Invalid SPDX_GIT_PURL_MAPPINGS entry: {mapping} (expected format: domain:purl_type)")
+
+ for domain, (purl_type, path_handler) in git_purl_handlers.items():
+ if f'://{domain}/' in git_url or f'//{domain}/' in git_url:
+ # Extract path after domain
+ path_start = git_url.find(f'{domain}/') + len(f'{domain}/')
+ path = git_url[path_start:].split('/')
+ purl_path = path_handler(path)
+ if purl_path:
+ dep_purl = f"{purl_type}/{purl_path}@{srcrev}"
+ bb.debug(1, f"SPDX: Generated {purl_type} PURL: {dep_purl}")
+ break
+
+ # Fallback: use parent package version if no other version found
+ if not dep_version:
+ pv = d.getVar('PV')
+ if pv and pv not in ['git', 'AUTOINC', 'INVALID', '${PV}']:
+ dep_version = pv
+ bb.debug(1, f"SPDX: Using parent PV for {file_name}: {dep_version}")
+
+ # Set version and PURL if extracted
+ if dep_version:
+ dl.software_packageVersion = dep_version
+
+ if dep_purl:
+ dl.software_packageUrl = dep_purl
+
if fd.method.supports_checksum(fd):
# TODO Need something better than hard coding this
for checksum_id in ["sha256", "sha1"]:
--
2.53.0
next prev parent reply other threads:[~2026-02-21 5:10 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-21 5:09 [PATCH v2 00/18] spdx30: SBOM enrichment, lifecycle scope, and documentation Stefano Tondo
2026-02-21 5:09 ` [PATCH v2 01/18] spdx30: Add configurable file filtering support Stefano Tondo
2026-02-21 5:09 ` [PATCH v2 02/18] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
2026-02-21 5:09 ` [PATCH v2 03/18] spdx30: Add ecosystem-specific PURL generation Stefano Tondo
2026-02-21 5:09 ` Stefano Tondo [this message]
2026-02-22 13:34 ` [OE-core] [PATCH v2 04/18] spdx30: Add version extraction from SRCREV for Git source components Mathieu Dubois-Briand
2026-02-21 5:09 ` [PATCH v2 05/18] spdx30: Add SPDX_GIT_PURL_MAPPINGS for Git hosting Stefano Tondo
2026-02-21 5:09 ` [PATCH v2 06/18] sbom30: Fix object deduplication to preserve complete data Stefano Tondo
2026-02-21 16:45 ` Joshua Watt
2026-02-21 5:09 ` [PATCH v2 07/18] spdx30: Enrich source downloads with external refs and PURLs Stefano Tondo
2026-02-21 5:09 ` [PATCH v2 08/18] spdx30: Include recipe base PURL in package external identifiers Stefano Tondo
2026-02-21 5:09 ` [PATCH v2 09/18] spdx30: Add image root metadata package with describes relationship Stefano Tondo
2026-02-21 16:47 ` Joshua Watt
2026-02-21 5:09 ` [PATCH v2 10/18] spdx30_tasks: Fix non-deterministic BUILDNAME in image package version Stefano Tondo
2026-02-21 5:09 ` [PATCH v2 11/18] spdx30: Add rootfs version and dependency scope classification Stefano Tondo
2026-02-21 5:10 ` [PATCH v2 12/18] oeqa/selftest: Add test for download_location defensive handling Stefano Tondo
2026-02-21 5:10 ` [PATCH v2 13/18] spdx.py: Add test for version extraction patterns Stefano Tondo
2026-02-21 5:10 ` [PATCH v2 14/18] cve_check: Escape special characters in CPE 2.3 formatted strings Stefano Tondo
2026-02-21 5:10 ` [PATCH v2 15/18] spdx-common: Declare SPDX_FORCE_*_SCOPE override variables Stefano Tondo
2026-02-21 5:10 ` [PATCH v2 16/18] oeqa/selftest: Add test for lifecycle scope classification Stefano Tondo
2026-02-21 5:10 ` [PATCH v2 17/18] spdx-common: Add documentation for undocumented SPDX variables Stefano Tondo
2026-02-21 5:10 ` [PATCH v2 18/18] spdx-common: Clarify documentation and make SPDX_LICENSES extensible Stefano Tondo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260221051006.335141-5-stondo@gmail.com \
--to=stondo@gmail.com \
--cc=Peter.Marko@siemens.com \
--cc=Ross.Burton@arm.com \
--cc=adrian.freihofer@siemens.com \
--cc=jpewhacker@gmail.com \
--cc=openembedded-core@lists.openembedded.org \
--cc=stefano.tondo.ext@siemens.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox