Openembedded Core Discussions
 help / color / mirror / Atom feed
From: Joshua Watt <jpewhacker@gmail.com>
To: Scott Murray <scott.murray@konsulko.com>,
	Saul Wold <Saul.Wold@windriver.com>
Cc: openembedded-core@lists.openembedded.org
Subject: Re: [OE-core] [PATCH v2] create-spdx: Get SPDX-License-Identifier from source
Date: Mon, 7 Feb 2022 14:35:45 -0600	[thread overview]
Message-ID: <2e636f2e-dba9-e336-8060-9e8cce40cedb@gmail.com> (raw)
In-Reply-To: <5aebf892-1a3d-9647-3490-39242941653@spiteful.org>


On 2/7/22 14:33, Scott Murray wrote:
> On Mon, 7 Feb 2022, Saul Wold wrote:
>
>> This patch will read the begining of source files and try to find
>> the SPDX-License-Identifier to populate the licenseInfoInFiles
>> field for each source file. This does not populate licenseConcluded
>> at this time, nor rolls it up to package level.
>>
>> We read as binary file since some source code seem to have some
>> binary characters, the license is then converted to ascii strings.
>>
>> Signed-off-by: Saul Wold <saul.wold@windriver.com>
>> ---
>> v2: Updated commit message, and fixed REGEX based on Peter's suggetion
>>
>>   meta/classes/create-spdx.bbclass | 23 +++++++++++++++++++++++
>>   1 file changed, 23 insertions(+)
>>
>> diff --git a/meta/classes/create-spdx.bbclass b/meta/classes/create-spdx.bbclass
>> index 8b4203fdb5..588489cc2b 100644
>> --- a/meta/classes/create-spdx.bbclass
>> +++ b/meta/classes/create-spdx.bbclass
>> @@ -37,6 +37,24 @@ SPDX_SUPPLIER[doc] = "The SPDX PackageSupplier field for SPDX packages created f
>>
>>   do_image_complete[depends] = "virtual/kernel:do_create_spdx"
>>
>> +def extract_licenses(filename):
>> +    import re
>> +    import oe.spdx
>> +
>> +    lic_regex = re.compile(b'SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?')
>> +
>> +    try:
>> +        with open(filename, 'rb') as f:
>> +            size = min(15000, os.stat(filename).st_size)
>> +            txt = f.read(size)
>> +            licenses = re.findall(lic_regex, txt)
>> +            if licenses:
>> +                ascii_licenses = [lic.decode('ascii') for lic in licenses]
>> +                return ascii_licenses
>> +    except Exception as e:
>> +        bb.warn(f"Exception reading {filename}: {e}")
>> +    return None
>> +
>>   def get_doc_namespace(d, doc):
>>       import uuid
>>       namespace_uuid = uuid.uuid5(uuid.NAMESPACE_DNS, d.getVar("SPDX_UUID_NAMESPACE"))
>> @@ -232,6 +250,11 @@ def add_package_files(d, doc, spdx_pkg, topdir, get_spdxid, get_types, *, archiv
>>                           checksumValue=bb.utils.sha256_file(filepath),
>>                       ))
>>
>> +                if "SOURCE" in spdx_file.fileTypes:
>> +                    extracted_lics = extract_licenses(filepath)
>> +                    if extracted_lics:
>> +                        spdx_file.licenseInfoInFiles = extracted_lics
>> +
>>                   doc.files.append(spdx_file)
>>                   doc.add_relationship(spdx_pkg, "CONTAINS", spdx_file)
>>                   spdx_pkg.hasFiles.append(spdx_file.SPDXID)
> IMO this seems like perhaps either going too far, or not far enough.  If
> we go to the trouble to scan source files for explicit SPDX license
> declarations, but do not go as far as pattern detection like the
> meta-spdxscanner layer does with its use Scancode Toolkit
> (https://github.com/nexB/scancode-toolkit), then it seems there's
> more potential for giving users a false impression as to the completeness
> of the resulting report/SBOM.  Perhaps that can be handled by making it
> very clear that further scanning and auditing is still required in the
> hopefully forthcoming create-spdx.bbclass documentation, but I can
> imagine having to explain this to customers.

Can you given an overview of what meta-spdxscanner does? I'm not quite 
clear what extra processing would be required here.

>
> Scott
>
>
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#161466): https://lists.openembedded.org/g/openembedded-core/message/161466
> Mute This Topic: https://lists.openembedded.org/mt/88980079/3616693
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [JPEWhacker@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>


  reply	other threads:[~2022-02-07 20:35 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-07 19:29 [PATCH v2] create-spdx: Get SPDX-License-Identifier from source Saul Wold
2022-02-07 20:33 ` [OE-core] " Scott Murray
2022-02-07 20:35   ` Joshua Watt [this message]
2022-02-07 20:59     ` Scott Murray
2022-02-08 12:50       ` Robert Berger
2022-02-08 13:19       ` Jan-Simon Moeller
2022-02-08 13:35         ` Mikko.Rapeli
2022-02-08 13:56           ` Jan-Simon Moeller
2022-02-08 14:16         ` Joshua Watt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e636f2e-dba9-e336-8060-9e8cce40cedb@gmail.com \
    --to=jpewhacker@gmail.com \
    --cc=Saul.Wold@windriver.com \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=scott.murray@konsulko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox