public inbox for openembedded-core@lists.openembedded.org
 help / color / mirror / Atom feed
* [RFC] Adding PURL identifiers to SPDX 3.0.1 install package elements
@ 2026-04-08 13:19 Martin von Willebrand
  2026-04-08 21:23 ` [OE-core] " Richard Purdie
  0 siblings, 1 reply; 5+ messages in thread
From: Martin von Willebrand @ 2026-04-08 13:19 UTC (permalink / raw)
  To: openembedded-core

Hi all,

While working with ORT (OSS Review Toolkit) to analyse Yocto-generated 
SPDX 3.0.1 documents for ongoing vulnerability management and 
monitoring, I noticed that install package elements 
(`software_primaryPurpose: install`) carry only a wildcard CPE 
identifier, e.g.:

cpe:2.3:*:*:busybox:1.36.1:*:*:*:*:*:*:*

ORT recently released an SPDX analyzer (since ORT 83.0) targeting Yocto 
5.0 generated SPDX 3.0.1 documents, which makes this gap more visible: 
the analyzer can consume the package graph, but the identifiers 
available are not sufficient to drive post-release CVE monitoring 
against external vulnerability databases such as NVD or VulnerableCode, 
since wildcard CPEs cannot be used directly as query keys.

If I understand correctly, sbom-cve-check faces the same underlying 
limitation. In our understanding it would benefit from this change too, 
though the two approaches are complementary rather than overlapping.

The upstream download URL for tarball-based packages is available at 
build time and is already derived from SRC_URI via `fetch_data_to_uri()` 
in `spdx30_tasks.py`. A PURL constructed from that URL and placed 
directly as an `externalIdentifier` on the install package element would 
give downstream consumers a durable, canonical identifier for 
post-release vulnerability monitoring.

Before drafting a patch I wanted to ask:

1. Is there a specific reason PURL is not currently emitted on install 
package elements — policy, technical constraint, or simply not yet done?
2. Would a contribution adding PURL for example as an 
`externalIdentifier` on install packages (derived from SRC_URI fetch 
data) be welcome in OE-Core master?

Happy to hear comments, and discuss scope and approach before writing code.

Thanks,
Martin von Willebrand
Double Open

-- 
Double Open Oy
c/o HH Partners, Attorneys-at-law
Eteläesplanadi 22 A
00130 Helsinki, Finland
Registered Office: Helsinki
Trade Register: Finnish Trade Register (PRH)
Business ID: FI33824962
Managing Director: Martin von Willebrand



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [OE-core] [RFC] Adding PURL identifiers to SPDX 3.0.1 install package elements
  2026-04-08 13:19 [RFC] Adding PURL identifiers to SPDX 3.0.1 install package elements Martin von Willebrand
@ 2026-04-08 21:23 ` Richard Purdie
  2026-04-09  8:55   ` Martin von Willebrand
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Purdie @ 2026-04-08 21:23 UTC (permalink / raw)
  To: martin.vonwillebrand, openembedded-core; +Cc: Joshua Watt

On Wed, 2026-04-08 at 16:19 +0300, Martin von Willebrand via lists.openembedded.org wrote:
> While working with ORT (OSS Review Toolkit) to analyse Yocto-generated 
> SPDX 3.0.1 documents for ongoing vulnerability management and 
> monitoring, I noticed that install package elements 
> (`software_primaryPurpose: install`) carry only a wildcard CPE 
> identifier, e.g.:
> 
> cpe:2.3:*:*:busybox:1.36.1:*:*:*:*:*:*:*
> 
> ORT recently released an SPDX analyzer (since ORT 83.0) targeting Yocto 
> 5.0 generated SPDX 3.0.1 documents, which makes this gap more visible: 
> the analyzer can consume the package graph, but the identifiers 
> available are not sufficient to drive post-release CVE monitoring 
> against external vulnerability databases such as NVD or VulnerableCode, 
> since wildcard CPEs cannot be used directly as query keys.
> 
> If I understand correctly, sbom-cve-check faces the same underlying 
> limitation. In our understanding it would benefit from this change too, 
> though the two approaches are complementary rather than overlapping.
> 
> The upstream download URL for tarball-based packages is available at 
> build time and is already derived from SRC_URI via `fetch_data_to_uri()` 
> in `spdx30_tasks.py`. A PURL constructed from that URL and placed 
> directly as an `externalIdentifier` on the install package element would 
> give downstream consumers a durable, canonical identifier for 
> post-release vulnerability monitoring.
> 
> Before drafting a patch I wanted to ask:
> 
> 1. Is there a specific reason PURL is not currently emitted on install 
> package elements — policy, technical constraint, or simply not yet done?
> 2. Would a contribution adding PURL for example as an 
> `externalIdentifier` on install packages (derived from SRC_URI fetch 
> data) be welcome in OE-Core master?
> 
> Happy to hear comments, and discuss scope and approach before writing code.

I'm not sure this is as simple as it first appears. We support the
notion of "premirrors" and "mirrors", which are searched before and
after the primary SRC_URI. We validate a checksum of the resulting
download to verify we did get what we expected but it doesn't always
come from that SRC_URI but can be cached. I guess we assume you use the
unmodified original SRC_URI?

What happens if there are two items in SRC_URI? If we patch the tarball
with other entries in SRC_URI, is the PURL still valid?

What happens in the cases where the recipe uses git to fetch the
sources instead of a tarball?

Can the external tooling not look at the url data already in the SPDX
output and work out the purls itself if it wants to?

I guess what I'm saying is we're trying to avoid too much "processing"
of the data we put into the SPDX so I'm cautious about duplicating
info. If the purl is always derived from the SRC_URI and we include
that, should we be adding the extra data?

I'm not trying to be negative, I'm just worried about where this might
lead and the corner cases that may be involved.

Coping Joshua who I suspect also may have thoughts on this.

Cheers,

Richard






^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [OE-core] [RFC] Adding PURL identifiers to SPDX 3.0.1 install package elements
  2026-04-08 21:23 ` [OE-core] " Richard Purdie
@ 2026-04-09  8:55   ` Martin von Willebrand
  2026-04-09 13:21     ` Joshua Watt
  0 siblings, 1 reply; 5+ messages in thread
From: Martin von Willebrand @ 2026-04-09  8:55 UTC (permalink / raw)
  To: Richard Purdie, openembedded-core; +Cc: Joshua Watt

On 4/9/2026 12:23 AM, Richard Purdie wrote:
> On Wed, 2026-04-08 at 16:19 +0300, Martin von Willebrand via lists.openembedded.org wrote:
>> While working with ORT (OSS Review Toolkit) to analyse Yocto-generated
>> SPDX 3.0.1 documents for ongoing vulnerability management and
>> monitoring, I noticed that install package elements
>> (`software_primaryPurpose: install`) carry only a wildcard CPE
>> identifier, e.g.:
>>
>> cpe:2.3:*:*:busybox:1.36.1:*:*:*:*:*:*:*
>>
>> ORT recently released an SPDX analyzer (since ORT 83.0) targeting Yocto
>> 5.0 generated SPDX 3.0.1 documents, which makes this gap more visible:
>> the analyzer can consume the package graph, but the identifiers
>> available are not sufficient to drive post-release CVE monitoring
>> against external vulnerability databases such as NVD or VulnerableCode,
>> since wildcard CPEs cannot be used directly as query keys.
>>
>> If I understand correctly, sbom-cve-check faces the same underlying
>> limitation. In our understanding it would benefit from this change too,
>> though the two approaches are complementary rather than overlapping.
>>
>> The upstream download URL for tarball-based packages is available at
>> build time and is already derived from SRC_URI via `fetch_data_to_uri()`
>> in `spdx30_tasks.py`. A PURL constructed from that URL and placed
>> directly as an `externalIdentifier` on the install package element would
>> give downstream consumers a durable, canonical identifier for
>> post-release vulnerability monitoring.
>>
>> Before drafting a patch I wanted to ask:
>>
>> 1. Is there a specific reason PURL is not currently emitted on install
>> package elements — policy, technical constraint, or simply not yet done?
>> 2. Would a contribution adding PURL for example as an
>> `externalIdentifier` on install packages (derived from SRC_URI fetch
>> data) be welcome in OE-Core master?
>>
>> Happy to hear comments, and discuss scope and approach before writing code.
> I'm not sure this is as simple as it first appears. We support the
> notion of "premirrors" and "mirrors", which are searched before and
> after the primary SRC_URI. We validate a checksum of the resulting
> download to verify we did get what we expected but it doesn't always
> come from that SRC_URI but can be cached. I guess we assume you use the
> unmodified original SRC_URI?
>
> What happens if there are two items in SRC_URI? If we patch the tarball
> with other entries in SRC_URI, is the PURL still valid?
>
> What happens in the cases where the recipe uses git to fetch the
> sources instead of a tarball?
>
> Can the external tooling not look at the url data already in the SPDX
> output and work out the purls itself if it wants to?
>
> I guess what I'm saying is we're trying to avoid too much "processing"
> of the data we put into the SPDX so I'm cautious about duplicating
> info. If the purl is always derived from the SRC_URI and we include
> that, should we be adding the extra data?
>
> I'm not trying to be negative, I'm just worried about where this might
> lead and the corner cases that may be involved.
>
> Coping Joshua who I suspect also may have thoughts on this.
>
> Cheers,
>
> Richard
>
Thanks for the detailed response and raising up the edge cases — I'll try
to address them below:

On mirrors and premirrors: yes, we would use the canonical upstream SRC_URI,
not the actual fetch location. The upstream URI would be correct for CVE
matching purposes, the point is not to record build-time fetch provenance.

On multiple SRC_URI entries and patches: a PURL would identify the upstream
package, not the downstream patched artefact — consistent with how the CPE is
already handled. CVEs are filed against upstream versions, and that is what
both identifiers reference.

On git sources: you're right, a git fetch produces a revision-pinned URI that
does not map cleanly to a standard PURL. Scoping an initial contribution to
non-git fetchers would sidestep this for now and cover the majority of cases.

On whether consumers can derive the PURL themselves: for the generic case
(pkg:generic/busybox@1.36.1), yes — name and version are already on the
install package element and any consumer can construct that trivially, as
long as they undestand Yocto's package structure.

The real value of doing this in OE is the ecosystem-typed cases: pkg:pypi,
pkg:npm, pkg:cpan, and similar. The bbclass inheritance information that
determines ecosystem type — inherit pypi, inherit npm, inherit cpan — exists
exclusively in the recipes and is not recoverable by downstream consumers from
the SPDX output, with or without SPDX_INCLUDE_SOURCES. It looks like that
information is available to OE at build time and nowhere else in the chain.

That said, we would propose emitting PURLs on all install package elements —
ecosystem-typed where the recipe provides that information, pkg:generic
otherwise. Partial coverage would produce an inconsistent SPDX where some
packages have PURLs and others do not, which is worse than either extreme.
And even the generic case, while derivable, removes the need for consumers
to understand Yocto's package structure at all.

The broader argument for doing this in OE: PURL is now an ECMA standard
(ECMA-426) and is on the path to becoming an ISO standard. It is the identifier
that the supply chain tooling ecosystem — ORT, Dependency-Track, CycloneDX,
and others — is converging on for package matching. OE already
takes the position that canonical package identifiers belong in the SPDX output,
by emitting CPE on install package elements. PURL is the natural complement to
that.

It is of course the community's call whether this belongs in OE or downstream. If
this doesn't belong in OE, we could instead document the (limited) derivation
algorithm so consumers can implement it consistently. But to me it looks that OE
is the right place, and that the ecosystem-typed cases, removal of need to
understand Yocto's package structure, as well as explicit purl support make it
worth doing in OE rather than leaving it to downstream.

Thanks, Martin



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [OE-core] [RFC] Adding PURL identifiers to SPDX 3.0.1 install package elements
  2026-04-09  8:55   ` Martin von Willebrand
@ 2026-04-09 13:21     ` Joshua Watt
  2026-04-10  7:46       ` Martin von Willebrand
  0 siblings, 1 reply; 5+ messages in thread
From: Joshua Watt @ 2026-04-09 13:21 UTC (permalink / raw)
  To: Martin von Willebrand; +Cc: Richard Purdie, OE-core, Stefano Tondo

[-- Attachment #1: Type: text/plain, Size: 6737 bytes --]

On Thu, Apr 9, 2026, 2:56 AM Martin von Willebrand <
martin.vonwillebrand@doubleopen.io> wrote:

> On 4/9/2026 12:23 AM, Richard Purdie wrote:
> > On Wed, 2026-04-08 at 16:19 +0300, Martin von Willebrand via
> lists.openembedded.org wrote:
> >> While working with ORT (OSS Review Toolkit) to analyse Yocto-generated
> >> SPDX 3.0.1 documents for ongoing vulnerability management and
> >> monitoring, I noticed that install package elements
> >> (`software_primaryPurpose: install`) carry only a wildcard CPE
> >> identifier, e.g.:
> >>
> >> cpe:2.3:*:*:busybox:1.36.1:*:*:*:*:*:*:*
> >>
> >> ORT recently released an SPDX analyzer (since ORT 83.0) targeting Yocto
> >> 5.0 generated SPDX 3.0.1 documents, which makes this gap more visible:
> >> the analyzer can consume the package graph, but the identifiers
> >> available are not sufficient to drive post-release CVE monitoring
> >> against external vulnerability databases such as NVD or VulnerableCode,
> >> since wildcard CPEs cannot be used directly as query keys.
> >>
> >> If I understand correctly, sbom-cve-check faces the same underlying
> >> limitation. In our understanding it would benefit from this change too,
> >> though the two approaches are complementary rather than overlapping.
> >>
> >> The upstream download URL for tarball-based packages is available at
> >> build time and is already derived from SRC_URI via `fetch_data_to_uri()`
> >> in `spdx30_tasks.py`. A PURL constructed from that URL and placed
> >> directly as an `externalIdentifier` on the install package element would
> >> give downstream consumers a durable, canonical identifier for
> >> post-release vulnerability monitoring.
> >>
> >> Before drafting a patch I wanted to ask:
> >>
> >> 1. Is there a specific reason PURL is not currently emitted on install
> >> package elements — policy, technical constraint, or simply not yet done?
> >> 2. Would a contribution adding PURL for example as an
> >> `externalIdentifier` on install packages (derived from SRC_URI fetch
> >> data) be welcome in OE-Core master?
> >>
> >> Happy to hear comments, and discuss scope and approach before writing
> code.
> > I'm not sure this is as simple as it first appears. We support the
> > notion of "premirrors" and "mirrors", which are searched before and
> > after the primary SRC_URI. We validate a checksum of the resulting
> > download to verify we did get what we expected but it doesn't always
> > come from that SRC_URI but can be cached. I guess we assume you use the
> > unmodified original SRC_URI?
> >
> > What happens if there are two items in SRC_URI? If we patch the tarball
> > with other entries in SRC_URI, is the PURL still valid?
> >
> > What happens in the cases where the recipe uses git to fetch the
> > sources instead of a tarball?
> >
> > Can the external tooling not look at the url data already in the SPDX
> > output and work out the purls itself if it wants to?
> >
> > I guess what I'm saying is we're trying to avoid too much "processing"
> > of the data we put into the SPDX so I'm cautious about duplicating
> > info. If the purl is always derived from the SRC_URI and we include
> > that, should we be adding the extra data?
> >
> > I'm not trying to be negative, I'm just worried about where this might
> > lead and the corner cases that may be involved.
> >
> > Coping Joshua who I suspect also may have thoughts on this.
> >
> > Cheers,
> >
> > Richard
> >
> Thanks for the detailed response and raising up the edge cases — I'll try
> to address them below:


Can you try again with spdx output from the latest master branch of yocto?
Stefano (CC'd) just recently did a lot of work in this area



> On mirrors and premirrors: yes, we would use the canonical upstream
> SRC_URI,
> not the actual fetch location. The upstream URI would be correct for CVE
> matching purposes, the point is not to record build-time fetch provenance.
>
> On multiple SRC_URI entries and patches: a PURL would identify the upstream
> package, not the downstream patched artefact — consistent with how the CPE
> is
> already handled. CVEs are filed against upstream versions, and that is what
> both identifiers reference.
>
> On git sources: you're right, a git fetch produces a revision-pinned URI
> that
> does not map cleanly to a standard PURL. Scoping an initial contribution to
> non-git fetchers would sidestep this for now and cover the majority of
> cases.
>
> On whether consumers can derive the PURL themselves: for the generic case
> (pkg:generic/busybox@1.36.1), yes — name and version are already on the
> install package element and any consumer can construct that trivially, as
> long as they undestand Yocto's package structure.
>
> The real value of doing this in OE is the ecosystem-typed cases: pkg:pypi,
> pkg:npm, pkg:cpan, and similar. The bbclass inheritance information that
> determines ecosystem type — inherit pypi, inherit npm, inherit cpan —
> exists
> exclusively in the recipes and is not recoverable by downstream consumers
> from
> the SPDX output, with or without SPDX_INCLUDE_SOURCES. It looks like that
> information is available to OE at build time and nowhere else in the chain.
>
> That said, we would propose emitting PURLs on all install package elements
> —
> ecosystem-typed where the recipe provides that information, pkg:generic
> otherwise. Partial coverage would produce an inconsistent SPDX where some
> packages have PURLs and others do not, which is worse than either extreme.
> And even the generic case, while derivable, removes the need for consumers
> to understand Yocto's package structure at all.
>
> The broader argument for doing this in OE: PURL is now an ECMA standard
> (ECMA-426) and is on the path to becoming an ISO standard. It is the
> identifier
> that the supply chain tooling ecosystem — ORT, Dependency-Track, CycloneDX,
> and others — is converging on for package matching. OE already
> takes the position that canonical package identifiers belong in the SPDX
> output,
> by emitting CPE on install package elements. PURL is the natural
> complement to
> that.
>
> It is of course the community's call whether this belongs in OE or
> downstream. If
> this doesn't belong in OE, we could instead document the (limited)
> derivation
> algorithm so consumers can implement it consistently. But to me it looks
> that OE
> is the right place, and that the ecosystem-typed cases, removal of need to
> understand Yocto's package structure, as well as explicit purl support
> make it
> worth doing in OE rather than leaving it to downstream.
>
> Thanks, Martin
>
>

[-- Attachment #2: Type: text/html, Size: 8026 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [OE-core] [RFC] Adding PURL identifiers to SPDX 3.0.1 install package elements
  2026-04-09 13:21     ` Joshua Watt
@ 2026-04-10  7:46       ` Martin von Willebrand
  0 siblings, 0 replies; 5+ messages in thread
From: Martin von Willebrand @ 2026-04-10  7:46 UTC (permalink / raw)
  To: Joshua Watt; +Cc: Richard Purdie, OE-core, Stefano Tondo

On 4/9/2026 4:21 PM, Joshua Watt wrote:
> Can you try again with spdx output from the latest master branch of yocto?

Hi Joshua,

Thanks for the pointer to master — I've now tested with a fresh OE-Core master
build. A few follow-up questions:

On PURL: I can see that pkg:yocto PURLs and ecosystem-typed PURLs (pkg:pypi,
pkg:npm, pkg:cpan) are already in master — that's great, and addresses the gap
I raised. Are there plans to backport this to scarthgap/5.0? And would
contributions toward that backport be welcome?

On the SPDX output format: master produces a distributed, content-addressed
store rather than the self-contained single-file SBOM that the scarthgap
backport generates. The rationale for the single-file format in 5.0 seems
clear — a self-contained document that downstream tools can consume without
understanding the Yocto store layout. Is that self-contained output planned
for walnascar/6.0, or is the expectation that downstream tools will assemble
the full SBOM from the distributed store themselves?

Thanks, Martin



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-10  7:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08 13:19 [RFC] Adding PURL identifiers to SPDX 3.0.1 install package elements Martin von Willebrand
2026-04-08 21:23 ` [OE-core] " Richard Purdie
2026-04-09  8:55   ` Martin von Willebrand
2026-04-09 13:21     ` Joshua Watt
2026-04-10  7:46       ` Martin von Willebrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox