From: Thomas Gleixner <tglx@linutronix.de>
To: linux-spdx@vger.kernel.org
Subject: SPDX in the kernel: State of the union
Date: Wed, 18 May 2022 01:31:42 +0200 [thread overview]
Message-ID: <87zgjfka75.ffs@tglx> (raw)
Folks!
After the initial SPDX effort which ended about three years ago there
was not really much progress neither in terms of file statistics nor in
terms of activity on this list... I'm refraining from asking the obvious
questions...
Nevertheless I'm trying to cut myself some cycles to get this rolling
again.
As a first step I tried to resurrect my old scripts. That was not really an
enjoyable experience due to the python2 -> python3 fallout and the changes
in scancode since then.
Though after quite some cursing I was able to gather at least initial
statistics and to analyze patches based on the scancode detection rules.
I surely have to say quite some words about the 'improved' scancode
detection rules too, but I sort that out with Philippe off-list.
So here is where we are:
Files without SPDX identifier: 16410 ~78% of total files
Files without any license hint: 7131 ~43% of !SPDX'ed files
Files with one license hint: 6673 ~40% of !SPDX'ed files
Files with two license hints: 2267 ~13% of !SPDX'ed files
Files with more than two hints: 339 ~ 2% of !SPDX'ed files
Files with less than 4 lines content:
0 length: 33 (some can be removed)
1 line: 276
2 lines: 109
3 lines: 135
Files without any license hint:
arch 774
block 1
certs 2
crypto 10
Documentation 4266
drivers 320
fs 26
include 124
init 0
ipc 0
kernel 14
lib 26
mm 3
net 15
samples 7
scripts 63
security 8
sound 9
tools 1457
usr 0
virt 0
Files with one license hint:
arch 1405
block 0
certs 1
crypto 1
Documentation 65
drivers 4369
fs 126
include 356
init 0
ipc 1
kernel 18
lib 35
mm 4
net 69
samples 14
scripts 26
security 0
sound 40
tools 141
usr 1
virt 0
Files with two license hints:
arch 731
block 0
certs 0
crypto 3
Documentation 13
drivers 1114
fs 66
include 101
init 0
ipc 0
kernel 0
lib 54
mm 0
net 91
samples 39
scripts 5
security 1
sound 14
tools 35
usr 0
virt 0
Script-able files with reasonable effort:
No hint: 6501 ~90% of no-hint files
One hint: 5129 ~76% of one-hint files
Two hints: 584 ~25% of two-hint files
Total: 12213 ~75% of !SPDX'ed file
Remaining: 4197 ~5% of total files
Scancode rules involved: 561
Scancode rules validated: 117
My plan is to focus on the 'low hanging' fruit of reasonably easy
script-able files first.
For the files with zero hints that requires a few questions to be answered
upfront:
1) What's the approach for files with obviously not copyright-able
content:
- Files which just include other file[s] (one or two lines)
- Files which have just a more or less useful comment why they
are otherwise empty (one to three lines)
- Files which just contain a #define FOO and an include of
another file to compile the included file with some other
functionality (two or three lines)
2) What's the approach for machine generated files:
- Primarily kernel configuration files
3) What's the approach for 'hidden' dot-files like .gitignore:
Those files are just providing information to tools. The file format
is defined by the tool (git, clang, coccinelle....) and the creative
content is exactly zero...
4) What's the approch for binary blobs or other files which cannot carry
license information in the file itself?
Which is related to the discussion in this thread:
https://lore.kernel.org/all/20220516101901.475557433@linutronix.de
The other question for these files with zero hints is which license to
chose. Sure you can argue that all files w/o any hint fall under the
project license, but especially the Documentation directory is interesting
as it's not clear for all of the various content what the preferred and
assumed license should be. That needs some thoughts and clarifications.
For the kernel code itself that's not a real question, but the tools
directory might need some care too.
For the files which have a licensing hint in whatever form, I think
resuming the work where we left off, i.e. mainly reviewing per scancode
match rules based patterns, makes a lot of sense.
Based on my cursory validation of those patterns I'm confident that we can
reach a 95% coverage within a reasonable amount of time.
Finally here is another round of important questions:
#1 Is there still interest to get this done? The silence on this list
after the initial effort is deafening.
#2 Are there still enough interested and comptent people on this list to
handle the legal questions?
#3 Was there any progress on the outstanding questions on this list where
discussion dried out almost 3 years ago?
I'm willing to pull the cart again, but if the interest and support stays
around zero, I surely have other things to do.
Thanks,
Thomas
next reply other threads:[~2022-05-17 23:31 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-17 23:31 Thomas Gleixner [this message]
2022-05-18 13:42 ` SPDX in the kernel: State of the union Allison Randal
2022-05-20 15:37 ` Thomas Gleixner
2022-05-22 15:17 ` Allison Randal
2022-05-22 17:35 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zgjfka75.ffs@tglx \
--to=tglx@linutronix.de \
--cc=linux-spdx@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox