From: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
To: Saul Wold <Saul.Wold@windriver.com>,
"openembedded-core@lists.openembedded.org"
<openembedded-core@lists.openembedded.org>,
"JPEWhacker@gmail.com" <JPEWhacker@gmail.com>
Subject: RE: [OE-core] [PATCH] create-spdx: Get SPDX-License-Identifier from source
Date: Wed, 2 Feb 2022 03:21:51 +0000 [thread overview]
Message-ID: <f7190e1f3a454459b0d5ffc2202f9626@axis.com> (raw)
In-Reply-To: <20220202000148.1462-1-saul.wold@windriver.com>
> -----Original Message-----
> From: openembedded-core@lists.openembedded.org <openembedded-core@lists.openembedded.org> On Behalf Of Saul Wold
> Sent: den 2 februari 2022 01:02
> To: openembedded-core@lists.openembedded.org; JPEWhacker@gmail.com
> Cc: Saul Wold <saul.wold@windriver.com>
> Subject: [OE-core] [PATCH] create-spdx: Get SPDX-License-Identifier from source
>
> This patch will read the begining of source files and try to find
> the SPDX-License-Identifier to populate the licenseInfoInFiles
> field for each source file. This does not populate licenseConculed
I assume that should be "licenseConcluded".
> at this time, nor rolls it up to package level.
>
> We read as binary to since some source code seem to have some
to -> too
> binary characters, the license is then converted to ascii strings.
>
> Signed-off-by: Saul Wold <saul.wold@windriver.com>
> ---
> Merge after Joshua's patch (spdx: Add set helper for list properties)
> merges
>
> meta/classes/create-spdx.bbclass | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/meta/classes/create-spdx.bbclass b/meta/classes/create-spdx.bbclass
> index 8b4203fdb5d..588489cc2b0 100644
> --- a/meta/classes/create-spdx.bbclass
> +++ b/meta/classes/create-spdx.bbclass
> @@ -37,6 +37,24 @@ SPDX_SUPPLIER[doc] = "The SPDX PackageSupplier field for SPDX packages created f
>
> do_image_complete[depends] = "virtual/kernel:do_create_spdx"
>
> +def extract_licenses(filename):
> + import re
> + import oe.spdx
You do not use oe.spdx in this function.
> +
> + lic_regex = re.compile(b'SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)[ |\n|\r\n]*?')
I assume you meant:
lic_regex = re.compile(b'SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)(?: |\n|\r\n)*?')
Not that it really matters though, as it will yield the same result as:
lic_regex = re.compile(b'SPDX-License-Identifier:\s+([-A-Za-z\d. ]+)')
However, neither of the expressions above will correctly match all the
SPDX-License-Identifier examples at https://spdx.dev/ids/#how.
Use this instead:
lic_regex = re.compile(b'^\W*SPDX-License-Identifier:\s*([ \w\d.()+-]+?)(?:\s+\W*)?$', re.MULTILINE)
> +
> + try:
> + with open(filename, 'rb') as f:
> + size = min(15000, os.stat(filename).st_size)
> + txt = f.read(size)
> + licenses = re.findall(lic_regex, txt)
> + if licenses:
> + ascii_licenses = [lic.decode('ascii') for lic in licenses]
> + return ascii_licenses
> + except Exception as e:
> + bb.warn(f"Exception reading {filename}: {e}")
> + return None
> +
> def get_doc_namespace(d, doc):
> import uuid
> namespace_uuid = uuid.uuid5(uuid.NAMESPACE_DNS, d.getVar("SPDX_UUID_NAMESPACE"))
> @@ -232,6 +250,11 @@ def add_package_files(d, doc, spdx_pkg, topdir, get_spdxid, get_types, *, archiv
> checksumValue=bb.utils.sha256_file(filepath),
> ))
>
> + if "SOURCE" in spdx_file.fileTypes:
> + extracted_lics = extract_licenses(filepath)
> + if extracted_lics:
> + spdx_file.licenseInfoInFiles = extracted_lics
> +
> doc.files.append(spdx_file)
> doc.add_relationship(spdx_pkg, "CONTAINS", spdx_file)
> spdx_pkg.hasFiles.append(spdx_file.SPDXID)
> --
> 2.31.1
//Peter
next prev parent reply other threads:[~2022-02-02 3:21 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-02 0:01 [PATCH] create-spdx: Get SPDX-License-Identifier from source Saul Wold
2022-02-02 3:21 ` Peter Kjellerstedt [this message]
2022-02-02 4:07 ` [OE-core] " Saul Wold
2022-02-02 11:32 ` Peter Kjellerstedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f7190e1f3a454459b0d5ffc2202f9626@axis.com \
--to=peter.kjellerstedt@axis.com \
--cc=JPEWhacker@gmail.com \
--cc=Saul.Wold@windriver.com \
--cc=openembedded-core@lists.openembedded.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox