From: Joshua Watt <jpewhacker@gmail.com>
To: Scott Murray <scott.murray@konsulko.com>,
Saul Wold <Saul.Wold@windriver.com>
Cc: openembedded-core@lists.openembedded.org
Subject: Re: [OE-core] [PATCH v2] create-spdx: Get SPDX-License-Identifier from source
Date: Mon, 7 Feb 2022 14:35:45 -0600 [thread overview]
Message-ID: <2e636f2e-dba9-e336-8060-9e8cce40cedb@gmail.com> (raw)
In-Reply-To: <5aebf892-1a3d-9647-3490-39242941653@spiteful.org>
On 2/7/22 14:33, Scott Murray wrote:
> On Mon, 7 Feb 2022, Saul Wold wrote:
>
>> This patch will read the begining of source files and try to find
>> the SPDX-License-Identifier to populate the licenseInfoInFiles
>> field for each source file. This does not populate licenseConcluded
>> at this time, nor rolls it up to package level.
>>
>> We read as binary file since some source code seem to have some
>> binary characters, the license is then converted to ascii strings.
>>
>> Signed-off-by: Saul Wold <saul.wold@windriver.com>
>> ---
>> v2: Updated commit message, and fixed REGEX based on Peter's suggetion
>>
>> meta/classes/create-spdx.bbclass | 23 +++++++++++++++++++++++
>> 1 file changed, 23 insertions(+)
>>
>> diff --git a/meta/classes/create-spdx.bbclass b/meta/classes/create-spdx.bbclass
>> index 8b4203fdb5..588489cc2b 100644
>> --- a/meta/classes/create-spdx.bbclass
>> +++ b/meta/classes/create-spdx.bbclass
>> @@ -37,6 +37,24 @@ SPDX_SUPPLIER[doc] = "The SPDX PackageSupplier field for SPDX packages created f
>>
>> do_image_complete[depends] = "virtual/kernel:do_create_spdx"
>>
>> +def extract_licenses(filename):
>> + import re
>> + import oe.spdx
>> +
>> + lic_regex = re.compile(b'SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?')
>> +
>> + try:
>> + with open(filename, 'rb') as f:
>> + size = min(15000, os.stat(filename).st_size)
>> + txt = f.read(size)
>> + licenses = re.findall(lic_regex, txt)
>> + if licenses:
>> + ascii_licenses = [lic.decode('ascii') for lic in licenses]
>> + return ascii_licenses
>> + except Exception as e:
>> + bb.warn(f"Exception reading {filename}: {e}")
>> + return None
>> +
>> def get_doc_namespace(d, doc):
>> import uuid
>> namespace_uuid = uuid.uuid5(uuid.NAMESPACE_DNS, d.getVar("SPDX_UUID_NAMESPACE"))
>> @@ -232,6 +250,11 @@ def add_package_files(d, doc, spdx_pkg, topdir, get_spdxid, get_types, *, archiv
>> checksumValue=bb.utils.sha256_file(filepath),
>> ))
>>
>> + if "SOURCE" in spdx_file.fileTypes:
>> + extracted_lics = extract_licenses(filepath)
>> + if extracted_lics:
>> + spdx_file.licenseInfoInFiles = extracted_lics
>> +
>> doc.files.append(spdx_file)
>> doc.add_relationship(spdx_pkg, "CONTAINS", spdx_file)
>> spdx_pkg.hasFiles.append(spdx_file.SPDXID)
> IMO this seems like perhaps either going too far, or not far enough. If
> we go to the trouble to scan source files for explicit SPDX license
> declarations, but do not go as far as pattern detection like the
> meta-spdxscanner layer does with its use Scancode Toolkit
> (https://github.com/nexB/scancode-toolkit), then it seems there's
> more potential for giving users a false impression as to the completeness
> of the resulting report/SBOM. Perhaps that can be handled by making it
> very clear that further scanning and auditing is still required in the
> hopefully forthcoming create-spdx.bbclass documentation, but I can
> imagine having to explain this to customers.
Can you given an overview of what meta-spdxscanner does? I'm not quite
clear what extra processing would be required here.
>
> Scott
>
>
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#161466): https://lists.openembedded.org/g/openembedded-core/message/161466
> Mute This Topic: https://lists.openembedded.org/mt/88980079/3616693
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [JPEWhacker@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
next prev parent reply other threads:[~2022-02-07 20:35 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-07 19:29 [PATCH v2] create-spdx: Get SPDX-License-Identifier from source Saul Wold
2022-02-07 20:33 ` [OE-core] " Scott Murray
2022-02-07 20:35 ` Joshua Watt [this message]
2022-02-07 20:59 ` Scott Murray
2022-02-08 12:50 ` Robert Berger
2022-02-08 13:19 ` Jan-Simon Moeller
2022-02-08 13:35 ` Mikko.Rapeli
2022-02-08 13:56 ` Jan-Simon Moeller
2022-02-08 14:16 ` Joshua Watt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2e636f2e-dba9-e336-8060-9e8cce40cedb@gmail.com \
--to=jpewhacker@gmail.com \
--cc=Saul.Wold@windriver.com \
--cc=openembedded-core@lists.openembedded.org \
--cc=scott.murray@konsulko.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox