public inbox for openembedded-core@lists.openembedded.org
 help / color / mirror / Atom feed
From: stondo@gmail.com
To: openembedded-core@lists.openembedded.org
Cc: JPEWhacker@gmail.com, richard.purdie@linuxfoundation.org,
	stefano.tondo.ext@siemens.com, Peter.Marko@siemens.com,
	adrian.freihofer@siemens.com
Subject: [OE-core][PATCH v10 4/7] spdx30: Enrich source downloads with version and PURL
Date: Fri, 20 Mar 2026 17:49:48 +0100	[thread overview]
Message-ID: <20260320164951.128572-5-stondo@gmail.com> (raw)
In-Reply-To: <20260320164951.128572-1-stondo@gmail.com>

From: Stefano Tondo <stefano.tondo.ext@siemens.com>

Add version extraction, PURL generation, and external references
to source download packages in SPDX 3.0 SBOMs:

- Extract version from SRCREV for Git sources (full SHA-1)
- Generate PURLs for Git sources on github.com by default
- Support custom mappings via SPDX_GIT_PURL_MAPPINGS variable
  (format: "domain:purl_type", split(':', 1) for parsing)
- Use ecosystem PURLs from SPDX_PACKAGE_URLS for non-Git
- Add VCS external references for Git downloads
- Add distribution external references for tarball downloads
- Parse Git URLs using urllib.parse
- Extract logic into _generate_git_purl() and
  _enrich_source_package() helpers

For non-Git sources, version is not set from PV since the recipe
version does not necessarily reflect the version of individual
downloaded files. Ecosystem PURLs (which include version) from
SPDX_PACKAGE_URLS are still used when available.

The SPDX_GIT_PURL_MAPPINGS variable allows configuring PURL
generation for self-hosted Git services (e.g., GitLab).
github.com is always mapped to pkg:github by default.

Signed-off-by: Stefano Tondo <stefano.tondo.ext@siemens.com>
---
 meta/classes/create-spdx-3.0.bbclass |   7 ++
 meta/lib/oe/spdx30_tasks.py          | 117 +++++++++++++++++++++++++++
 2 files changed, 124 insertions(+)

diff --git a/meta/classes/create-spdx-3.0.bbclass b/meta/classes/create-spdx-3.0.bbclass
index def2dacbc3..9e912b34e1 100644
--- a/meta/classes/create-spdx-3.0.bbclass
+++ b/meta/classes/create-spdx-3.0.bbclass
@@ -152,6 +152,13 @@ SPDX_PACKAGE_URLS[doc] = "A space separated list of Package URLs (purls) for \
     Override this variable to replace the default, otherwise append or prepend \
     to add additional purls."
 
+SPDX_GIT_PURL_MAPPINGS ??= ""
+SPDX_GIT_PURL_MAPPINGS[doc] = "A space separated list of domain:purl_type \
+    mappings to configure PURL generation for Git source downloads. \
+    For example, "gitlab.example.com:pkg:gitlab" maps repositories hosted \
+    on gitlab.example.com to the pkg:gitlab PURL type. \
+    github.com is always mapped to pkg:github by default."
+
 IMAGE_CLASSES:append = " create-spdx-image-3.0"
 SDK_CLASSES += "create-spdx-sdk-3.0"
 
diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
index 8aaafea616..5639137520 100644
--- a/meta/lib/oe/spdx30_tasks.py
+++ b/meta/lib/oe/spdx30_tasks.py
@@ -14,6 +14,7 @@ import oe.spdx_common
 import oe.sdk
 import os
 import re
+import urllib.parse
 
 from contextlib import contextmanager
 from datetime import datetime, timezone
@@ -378,6 +379,120 @@ def collect_dep_sources(dep_objsets, dest):
             index_sources_by_hash(e.to, dest)
 
 
+def _generate_git_purl(d, download_location, srcrev):
+    """Generate a Package URL for a Git source from its download location.
+
+    Parses the Git URL to identify the hosting service and generates the
+    appropriate PURL type. Supports github.com by default and custom
+    mappings via SPDX_GIT_PURL_MAPPINGS.
+
+    Returns the PURL string or None if no mapping matches.
+    """
+    if not download_location or not download_location.startswith('git+'):
+        return None
+
+    git_url = download_location[4:]  # Remove 'git+' prefix
+
+    # Default handler: github.com
+    git_purl_handlers = {
+        'github.com': 'pkg:github',
+    }
+
+    # Custom PURL mappings from SPDX_GIT_PURL_MAPPINGS
+    # Format: "domain1:purl_type1 domain2:purl_type2"
+    custom_mappings = d.getVar('SPDX_GIT_PURL_MAPPINGS')
+    if custom_mappings:
+        for mapping in custom_mappings.split():
+            parts = mapping.split(':', 1)
+            if len(parts) == 2:
+                git_purl_handlers[parts[0]] = parts[1]
+                bb.debug(2, f"Added custom Git PURL mapping: {parts[0]} -> {parts[1]}")
+            else:
+                bb.warn(f"Invalid SPDX_GIT_PURL_MAPPINGS entry: {mapping} (expected format: domain:purl_type)")
+
+    try:
+        parsed = urllib.parse.urlparse(git_url)
+    except Exception:
+        return None
+
+    hostname = parsed.hostname
+    if not hostname:
+        return None
+
+    for domain, purl_type in git_purl_handlers.items():
+        if hostname == domain:
+            path = parsed.path.strip('/')
+            path_parts = path.split('/')
+            if len(path_parts) >= 2:
+                owner = path_parts[0]
+                repo = path_parts[1].replace('.git', '')
+                return f"{purl_type}/{owner}/{repo}@{srcrev}"
+            break
+
+    return None
+
+
+def _enrich_source_package(d, dl, fd, file_name, primary_purpose):
+    """Enrich a source download package with version, PURL, and external refs.
+
+    Extracts version from SRCREV for Git sources, generates PURLs for
+    known hosting services, and adds external references for VCS,
+    distribution URLs, and homepage.
+    """
+    version = None
+    purl = None
+
+    if fd.type == "git":
+        # Use full SHA-1 from fd.revision
+        srcrev = getattr(fd, 'revision', None)
+        if srcrev and srcrev not in {'${AUTOREV}', 'AUTOINC', 'INVALID'}:
+            version = srcrev
+
+        # Generate PURL for Git hosting services
+        download_location = getattr(dl, 'software_downloadLocation', None)
+        if version and download_location:
+            purl = _generate_git_purl(d, download_location, version)
+    else:
+        # Use ecosystem PURL from SPDX_PACKAGE_URLS if available
+        package_urls = (d.getVar('SPDX_PACKAGE_URLS') or '').split()
+        for url in package_urls:
+            if not url.startswith('pkg:yocto'):
+                purl = url
+                break
+
+    if version:
+        dl.software_packageVersion = version
+
+    if purl:
+        dl.software_packageUrl = purl
+
+    # Add external references
+    download_location = getattr(dl, 'software_downloadLocation', None)
+    if download_location and isinstance(download_location, str):
+        dl.externalRef = dl.externalRef or []
+
+        if download_location.startswith('git+'):
+            # VCS reference for Git repositories
+            git_url = download_location[4:]
+            if '@' in git_url:
+                git_url = git_url.split('@')[0]
+
+            dl.externalRef.append(
+                oe.spdx30.ExternalRef(
+                    externalRefType=oe.spdx30.ExternalRefType.vcs,
+                    locator=[git_url],
+                )
+            )
+        elif download_location.startswith(('http://', 'https://', 'ftp://')):
+            # Distribution reference for tarball/archive downloads
+            dl.externalRef.append(
+                oe.spdx30.ExternalRef(
+                    externalRefType=oe.spdx30.ExternalRefType.altDownloadLocation,
+                    locator=[download_location],
+                )
+            )
+
+
 def add_download_files(d, objset):
     inputs = set()
 
@@ -441,6 +556,8 @@ def add_download_files(d, objset):
                 )
             )
 
+            _enrich_source_package(d, dl, fd, file_name, primary_purpose)
+
             if fd.method.supports_checksum(fd):
                 # TODO Need something better than hard coding this
                 for checksum_id in ["sha256", "sha1"]:
-- 
2.53.0



  parent reply	other threads:[~2026-03-20 16:50 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-02 16:01 [PATCH v5 00/10] spdx30: SBOM enrichment and documentation Stefano Tondo
2026-03-02 16:01 ` [PATCH v5 01/10] spdx30: Add configurable file filtering support Stefano Tondo
2026-03-02 16:01 ` [PATCH v5 02/10] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
2026-03-02 16:01 ` [PATCH v5 03/10] spdx30: Add ecosystem-specific PURL generation Stefano Tondo
2026-03-02 16:01 ` [PATCH v5 04/10] spdx30: Add version extraction from SRCREV for Git source components Stefano Tondo
2026-03-03  8:42   ` [OE-core] " Mathieu Dubois-Briand
2026-03-03 10:27     ` Tondo, Stefano
2026-03-02 16:01 ` [PATCH v5 05/10] spdx30: Add SPDX_GIT_PURL_MAPPINGS for Git hosting Stefano Tondo
2026-03-02 16:01 ` [PATCH v5 06/10] spdx30: Enrich source downloads with external refs and PURLs Stefano Tondo
2026-03-02 16:01 ` [PATCH v5 07/10] oeqa/selftest: Add test for download_location defensive handling Stefano Tondo
2026-03-02 16:01 ` [PATCH v5 08/10] spdx.py: Add test for version extraction patterns Stefano Tondo
2026-03-02 16:01 ` [PATCH v5 09/10] cve_check: Escape special characters in CPE 2.3 formatted strings Stefano Tondo
2026-03-02 16:01 ` [PATCH v5 10/10] spdx-common: Add documentation for undocumented SPDX variables Stefano Tondo
2026-03-02 16:15 ` [OE-core] [PATCH v5 00/10] spdx30: SBOM enrichment and documentation Antonin Godard
2026-03-03  8:20   ` Tondo, Stefano
2026-03-04 17:05 ` [PATCH v6 " Stefano Tondo
2026-03-04 17:05   ` [PATCH v6 01/10] spdx30: Add configurable file filtering support Stefano Tondo
2026-03-07 21:53     ` Joshua Watt
2026-03-04 17:05   ` [PATCH v6 02/10] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
2026-03-04 17:05   ` [PATCH v6 03/10] spdx30: Add ecosystem-specific PURL generation Stefano Tondo
2026-03-04 17:05   ` [PATCH v6 04/10] spdx30: Add version extraction from SRCREV for Git source components Stefano Tondo
2026-03-07 22:32     ` Joshua Watt
2026-03-04 17:05   ` [PATCH v6 05/10] spdx30: Add SPDX_GIT_PURL_MAPPINGS for Git hosting Stefano Tondo
2026-03-04 17:05   ` [PATCH v6 06/10] spdx30: Enrich source downloads with external refs and PURLs Stefano Tondo
2026-03-04 17:05   ` [PATCH v6 07/10] oeqa/selftest: Add test for download_location defensive handling Stefano Tondo
2026-03-04 17:05   ` [PATCH v6 08/10] spdx.py: Add test for version extraction patterns Stefano Tondo
2026-03-04 17:05   ` [PATCH v6 09/10] cve_check: Escape special characters in CPE 2.3 formatted strings Stefano Tondo
2026-03-04 17:05   ` [PATCH v6 10/10] spdx-common: Add documentation for undocumented SPDX variables Stefano Tondo
2026-03-06  6:32   ` [PATCH v6 00/10] spdx30: SBOM enrichment and documentation Mathieu Dubois-Briand
2026-03-06 13:59   ` [OE-core][PATCH v7 " Stefano Tondo
2026-03-06 13:59     ` [OE-core][PATCH v7 01/10] spdx30: Add configurable file filtering support Stefano Tondo
2026-03-06 13:59     ` [OE-core][PATCH v7 02/10] spdx30: Add supplier support for image and SDK SBOMs Stefano Tondo
2026-03-07 21:55       ` Joshua Watt
2026-03-06 13:59     ` [OE-core][PATCH v7 03/10] spdx30: Add ecosystem-specific PURL generation Stefano Tondo
2026-03-07 22:15       ` Joshua Watt
2026-03-06 13:59     ` [OE-core][PATCH v7 04/10] spdx30: Add version extraction from SRCREV for Git source components Stefano Tondo
2026-03-06 13:59     ` [OE-core][PATCH v7 05/10] spdx30: Add SPDX_GIT_PURL_MAPPINGS for Git hosting Stefano Tondo
2026-03-06 13:59     ` [OE-core][PATCH v7 06/10] spdx30: Enrich source downloads with external refs and PURLs Stefano Tondo
2026-03-07 22:42       ` Joshua Watt
2026-03-06 13:59     ` [OE-core][PATCH v7 07/10] oeqa/selftest: Add test for download_location defensive handling Stefano Tondo
2026-03-07 22:48       ` Joshua Watt
2026-03-06 14:00     ` [OE-core][PATCH v7 08/10] spdx.py: Add test for version extraction patterns Stefano Tondo
2026-03-07 22:51       ` Joshua Watt
2026-03-06 14:00     ` [OE-core][PATCH v7 09/10] cve_check: Escape special characters in CPE 2.3 formatted strings Stefano Tondo
2026-03-07 22:01       ` Joshua Watt
2026-03-06 14:00     ` [OE-core][PATCH v7 10/10] spdx-common: Add documentation for undocumented SPDX variables Stefano Tondo
2026-03-07 22:03       ` Joshua Watt
2026-03-09 13:28     ` [OE-core][PATCH v8 0/7] SPDX 3.0 SBOM enrichment and compliance improvements stondo
2026-03-09 13:28       ` [OE-core][PATCH v8 1/7] spdx30: Add configurable file exclusion pattern support stondo
2026-03-11 20:29         ` Joshua Watt
2026-03-09 13:28       ` [OE-core][PATCH v8 2/7] spdx30: Add supplier support for image and SDK SBOMs stondo
2026-03-11 20:31         ` Joshua Watt
2026-03-09 13:28       ` [OE-core][PATCH v8 3/7] spdx30: Add ecosystem-specific PURL generation via bbclasses stondo
2026-03-11 20:34         ` Joshua Watt
2026-03-09 13:28       ` [OE-core][PATCH v8 4/7] spdx30: Enrich source downloads with version and PURL stondo
2026-03-11 22:49         ` Joshua Watt
2026-03-11 22:51         ` Joshua Watt
2026-03-09 13:28       ` [OE-core][PATCH v8 5/7] oeqa/selftest: Add tests for source download enrichment stondo
2026-03-11 20:40         ` Joshua Watt
2026-03-09 13:28       ` [OE-core][PATCH v8 6/7] cve_check: Escape special characters in CPE 2.3 strings stondo
2026-03-11 20:44         ` Joshua Watt
2026-03-09 13:28       ` [OE-core][PATCH v8 7/7] spdx-common: Add documentation for undocumented SPDX variables stondo
2026-03-11 20:42         ` Joshua Watt
2026-03-12 15:38       ` [OE-core][PATCH v9 0/7] SPDX 3.0 SBOM enrichment and compliance improvements stondo
2026-03-12 15:38         ` [OE-core][PATCH v9 1/7] spdx30: Add configurable file exclusion pattern support stondo
2026-03-12 15:38         ` [OE-core][PATCH v9 2/7] spdx30: Add supplier support for image and SDK SBOMs stondo
2026-03-12 15:38         ` [OE-core][PATCH v9 3/7] spdx30: Add ecosystem-specific PURL generation via bbclasses stondo
2026-03-19 10:25           ` Richard Purdie
2026-03-12 15:38         ` [OE-core][PATCH v9 4/7] spdx30: Enrich source downloads with version and PURL stondo
2026-03-12 15:38         ` [OE-core][PATCH v9 5/7] oeqa/selftest: Add tests for source download enrichment stondo
2026-03-13  6:14           ` Mathieu Dubois-Briand
2026-03-13  8:30             ` Tondo, Stefano
2026-03-12 15:38         ` [OE-core][PATCH v9 6/7] cve_check: Escape special characters in CPE 2.3 strings stondo
2026-03-12 15:38         ` [OE-core][PATCH v9 7/7] spdx-common: Add documentation for undocumented SPDX variables stondo
2026-03-20 16:49         ` [OE-core][PATCH v10 0/7] SPDX 3.0 SBOM enrichment and compliance improvements stondo
2026-03-20 16:49           ` [OE-core][PATCH v10 1/7] spdx30: Add configurable file exclusion pattern support stondo
2026-03-20 16:49           ` [OE-core][PATCH v10 2/7] spdx30: Add supplier support for image and SDK SBOMs stondo
2026-03-20 16:49           ` [OE-core][PATCH v10 3/7] spdx30: Add ecosystem-specific PURL generation via bbclasses stondo
2026-03-20 16:49           ` stondo [this message]
2026-03-20 16:49           ` [OE-core][PATCH v10 5/7] oeqa/selftest: Add tests for source download enrichment stondo
2026-03-20 16:49           ` [OE-core][PATCH v10 6/7] cve_check: Escape special characters in CPE 2.3 strings stondo
2026-03-20 16:49           ` [OE-core][PATCH v10 7/7] spdx-common: Add documentation for undocumented SPDX variables stondo
2026-03-20 17:13           ` [OE-core][PATCH v10 0/7] SPDX 3.0 SBOM enrichment and compliance improvements Richard Purdie
2026-03-20 17:22         ` [OE-core][PATCH v9 " Mathieu Dubois-Briand
2026-03-20 17:24           ` Mathieu Dubois-Briand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260320164951.128572-5-stondo@gmail.com \
    --to=stondo@gmail.com \
    --cc=JPEWhacker@gmail.com \
    --cc=Peter.Marko@siemens.com \
    --cc=adrian.freihofer@siemens.com \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=richard.purdie@linuxfoundation.org \
    --cc=stefano.tondo.ext@siemens.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox