* [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode @ 2018-12-12 13:12 Thierry Reding 2018-12-12 13:12 ` [PATCH 2/2] scripts: Add spdxcheck.py self test Thierry Reding 2018-12-12 18:14 ` [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Jeremy Cline 0 siblings, 2 replies; 8+ messages in thread From: Thierry Reding @ 2018-12-12 13:12 UTC (permalink / raw) To: Andrew Morton, Thomas Gleixner Cc: Jonathan Corbet, Joe Perches, Jeremy Cline, Uwe Kleine-König, linux-kernel From: Thierry Reding <treding@nvidia.com> The spdxcheck script currently falls over when confronted with a binary file (such as Documentation/logo.gif). To avoid that, always open files in binary mode and decode line-by-line, ignoring encoding errors. One tricky case is when piping data into the script and reading it from standard input. By default, standard input will be opened in text mode, so we need to reopen it in binary mode. Signed-off-by: Thierry Reding <treding@nvidia.com> --- scripts/spdxcheck.py | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py index 5056fb3b897d..e559c6294c39 100755 --- a/scripts/spdxcheck.py +++ b/scripts/spdxcheck.py @@ -168,6 +168,7 @@ class id_parser(object): self.curline = 0 try: for line in fd: + line = line.decode(locale.getpreferredencoding(False), errors='ignore') self.curline += 1 if self.curline > maxlines: break @@ -249,12 +250,13 @@ if __name__ == '__main__': try: if len(args.path) and args.path[0] == '-': - parser.parse_lines(sys.stdin, args.maxlines, '-') + stdin = os.fdopen(sys.stdin.fileno(), 'rb') + parser.parse_lines(stdin, args.maxlines, '-') else: if args.path: for p in args.path: if os.path.isfile(p): - parser.parse_lines(open(p), args.maxlines, p) + parser.parse_lines(open(p, 'rb'), args.maxlines, p) elif os.path.isdir(p): scan_git_subtree(repo.head.reference.commit.tree, p) else: -- 2.19.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] scripts: Add spdxcheck.py self test 2018-12-12 13:12 [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Thierry Reding @ 2018-12-12 13:12 ` Thierry Reding 2018-12-12 18:14 ` [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Jeremy Cline 1 sibling, 0 replies; 8+ messages in thread From: Thierry Reding @ 2018-12-12 13:12 UTC (permalink / raw) To: Andrew Morton, Thomas Gleixner Cc: Jonathan Corbet, Joe Perches, Jeremy Cline, Uwe Kleine-König, linux-kernel From: Thierry Reding <treding@nvidia.com> Add a script that will run spdxcheck.py through a couple of self tests to simplify validation in the future. The tests are run for both Python 2 and Python 3 to make sure all changes to the script remain compatible across both versions. The script tests a regular text file (Makefile) for basic sanity checks and then runs it on a binary file (Documentation/logo.gif) to make sure it works in both cases. It also tests opening files passed on the command line as well as piped files read from standard input. Finally a run on the complete tree will be performed to catch any other potential issues. Signed-off-by: Thierry Reding <treding@nvidia.com> --- scripts/spdxcheck-test.sh | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100755 scripts/spdxcheck-test.sh diff --git a/scripts/spdxcheck-test.sh b/scripts/spdxcheck-test.sh new file mode 100755 index 000000000000..cfea6a0d1cc0 --- /dev/null +++ b/scripts/spdxcheck-test.sh @@ -0,0 +1,12 @@ +#!/bin/sh + +for PYTHON in python2 python3; do + # run check on a text and a binary file + for FILE in Makefile Documentation/logo.gif; do + $PYTHON scripts/spdxcheck.py $FILE + $PYTHON scripts/spdxcheck.py - < $FILE + done + + # run check on complete tree to catch any other issues + $PYTHON scripts/spdxcheck.py > /dev/null +done -- 2.19.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode 2018-12-12 13:12 [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Thierry Reding 2018-12-12 13:12 ` [PATCH 2/2] scripts: Add spdxcheck.py self test Thierry Reding @ 2018-12-12 18:14 ` Jeremy Cline 2018-12-13 2:51 ` Joe Perches 2018-12-13 7:37 ` Uwe Kleine-König 1 sibling, 2 replies; 8+ messages in thread From: Jeremy Cline @ 2018-12-12 18:14 UTC (permalink / raw) To: Thierry Reding Cc: Andrew Morton, Thomas Gleixner, Jonathan Corbet, Joe Perches, Uwe Kleine-König, linux-kernel Hi, On Wed, Dec 12, 2018 at 02:12:09PM +0100, Thierry Reding wrote: > From: Thierry Reding <treding@nvidia.com> > > The spdxcheck script currently falls over when confronted with a binary > file (such as Documentation/logo.gif). To avoid that, always open files > in binary mode and decode line-by-line, ignoring encoding errors. > > One tricky case is when piping data into the script and reading it from > standard input. By default, standard input will be opened in text mode, > so we need to reopen it in binary mode. > > Signed-off-by: Thierry Reding <treding@nvidia.com> > --- > scripts/spdxcheck.py | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py > index 5056fb3b897d..e559c6294c39 100755 > --- a/scripts/spdxcheck.py > +++ b/scripts/spdxcheck.py > @@ -168,6 +168,7 @@ class id_parser(object): > self.curline = 0 > try: > for line in fd: > + line = line.decode(locale.getpreferredencoding(False), errors='ignore') > self.curline += 1 > if self.curline > maxlines: > break > @@ -249,12 +250,13 @@ if __name__ == '__main__': > > try: > if len(args.path) and args.path[0] == '-': > - parser.parse_lines(sys.stdin, args.maxlines, '-') > + stdin = os.fdopen(sys.stdin.fileno(), 'rb') > + parser.parse_lines(stdin, args.maxlines, '-') > else: > if args.path: > for p in args.path: > if os.path.isfile(p): > - parser.parse_lines(open(p), args.maxlines, p) > + parser.parse_lines(open(p, 'rb'), args.maxlines, p) > elif os.path.isdir(p): > scan_git_subtree(repo.head.reference.commit.tree, p) > else: > -- > 2.19.1 > It might be worth noting this fixes commit 6f4d29df66ac ("scripts/spdxcheck.py: make python3 compliant") and also Cc this for stable since 6f4d29df66ac got backported to v4.19. While that commit did indeed make the script work with Python 3 for piping data, it broke Python 2 and made its way to stable. Reviewed-by: Jeremy Cline <jcline@redhat.com> Regards, Jeremy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode 2018-12-12 18:14 ` [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Jeremy Cline @ 2018-12-13 2:51 ` Joe Perches 2018-12-13 16:14 ` Thierry Reding 2018-12-13 7:37 ` Uwe Kleine-König 1 sibling, 1 reply; 8+ messages in thread From: Joe Perches @ 2018-12-13 2:51 UTC (permalink / raw) To: Jeremy Cline, Thierry Reding Cc: Andrew Morton, Thomas Gleixner, Jonathan Corbet, Uwe Kleine-König, linux-kernel On Wed, 2018-12-12 at 13:14 -0500, Jeremy Cline wrote: > It might be worth noting this fixes commit 6f4d29df66ac > ("scripts/spdxcheck.py: make python3 compliant") umm, how does it do that? > and also Cc this for > stable since 6f4d29df66ac got backported to v4.19. While that commit > did indeed make the script work with Python 3 for piping data, it broke > Python 2 and made its way to stable. I think attempting to use spdxcheck on binary files is not useful in the first place. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode 2018-12-13 2:51 ` Joe Perches @ 2018-12-13 16:14 ` Thierry Reding 0 siblings, 0 replies; 8+ messages in thread From: Thierry Reding @ 2018-12-13 16:14 UTC (permalink / raw) To: Joe Perches Cc: Jeremy Cline, Andrew Morton, Thomas Gleixner, Jonathan Corbet, Uwe Kleine-König, linux-kernel [-- Attachment #1: Type: text/plain, Size: 888 bytes --] On Wed, Dec 12, 2018 at 06:51:19PM -0800, Joe Perches wrote: > On Wed, 2018-12-12 at 13:14 -0500, Jeremy Cline wrote: > > It might be worth noting this fixes commit 6f4d29df66ac > > ("scripts/spdxcheck.py: make python3 compliant") > > umm, how does it do that? > > > and also Cc this for > > stable since 6f4d29df66ac got backported to v4.19. While that commit > > did indeed make the script work with Python 3 for piping data, it broke > > Python 2 and made its way to stable. > > I think attempting to use spdxcheck on binary files > is not useful in the first place. Agreed. However, if run without arguments the tool will walk the entire tree and inevitably run into binary files. I realize that this is kind of a lame way of dealing with binary files, but the alternatives would be much more complicated and involve dealing with mimetypes and such. Thierry [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode 2018-12-12 18:14 ` [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Jeremy Cline 2018-12-13 2:51 ` Joe Perches @ 2018-12-13 7:37 ` Uwe Kleine-König 2018-12-13 15:10 ` Jeremy Cline 1 sibling, 1 reply; 8+ messages in thread From: Uwe Kleine-König @ 2018-12-13 7:37 UTC (permalink / raw) To: Jeremy Cline Cc: Thierry Reding, Andrew Morton, Thomas Gleixner, Jonathan Corbet, Joe Perches, linux-kernel On Wed, Dec 12, 2018 at 01:14:10PM -0500, Jeremy Cline wrote: > Hi, > > On Wed, Dec 12, 2018 at 02:12:09PM +0100, Thierry Reding wrote: > > From: Thierry Reding <treding@nvidia.com> > > > > The spdxcheck script currently falls over when confronted with a binary > > file (such as Documentation/logo.gif). To avoid that, always open files > > in binary mode and decode line-by-line, ignoring encoding errors. I suggest pointing out that the breakage only happens with python3 and results in a UnicodeDecodeError. > > One tricky case is when piping data into the script and reading it from > > standard input. By default, standard input will be opened in text mode, > > so we need to reopen it in binary mode. > > > > Signed-off-by: Thierry Reding <treding@nvidia.com> > > --- > > scripts/spdxcheck.py | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py > > index 5056fb3b897d..e559c6294c39 100755 > > --- a/scripts/spdxcheck.py > > +++ b/scripts/spdxcheck.py > > @@ -168,6 +168,7 @@ class id_parser(object): > > self.curline = 0 > > try: > > for line in fd: > > + line = line.decode(locale.getpreferredencoding(False), errors='ignore') > > self.curline += 1 > > if self.curline > maxlines: > > break > > @@ -249,12 +250,13 @@ if __name__ == '__main__': > > > > try: > > if len(args.path) and args.path[0] == '-': > > - parser.parse_lines(sys.stdin, args.maxlines, '-') > > + stdin = os.fdopen(sys.stdin.fileno(), 'rb') > > + parser.parse_lines(stdin, args.maxlines, '-') > > else: > > if args.path: > > for p in args.path: > > if os.path.isfile(p): > > - parser.parse_lines(open(p), args.maxlines, p) > > + parser.parse_lines(open(p, 'rb'), args.maxlines, p) > > elif os.path.isdir(p): > > scan_git_subtree(repo.head.reference.commit.tree, p) > > else: > > -- > > 2.19.1 > > > > It might be worth noting this fixes commit 6f4d29df66ac > ("scripts/spdxcheck.py: make python3 compliant") and also Cc this for > stable since 6f4d29df66ac got backported to v4.19. While that commit > did indeed make the script work with Python 3 for piping data, it broke > Python 2 and made its way to stable. It didn't break for me. Can you provide details about how and when it broke for you? Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | http://www.pengutronix.de/ | ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode 2018-12-13 7:37 ` Uwe Kleine-König @ 2018-12-13 15:10 ` Jeremy Cline 2018-12-13 19:40 ` Uwe Kleine-König 0 siblings, 1 reply; 8+ messages in thread From: Jeremy Cline @ 2018-12-13 15:10 UTC (permalink / raw) To: Uwe Kleine-König Cc: Thierry Reding, Andrew Morton, Thomas Gleixner, Jonathan Corbet, Joe Perches, linux-kernel On Thu, Dec 13, 2018 at 08:37:08AM +0100, Uwe Kleine-König wrote: > On Wed, Dec 12, 2018 at 01:14:10PM -0500, Jeremy Cline wrote: > > Hi, > > > > On Wed, Dec 12, 2018 at 02:12:09PM +0100, Thierry Reding wrote: > > > From: Thierry Reding <treding@nvidia.com> > > > > > > The spdxcheck script currently falls over when confronted with a binary > > > file (such as Documentation/logo.gif). To avoid that, always open files > > > in binary mode and decode line-by-line, ignoring encoding errors. > > I suggest pointing out that the breakage only happens with python3 and > results in a UnicodeDecodeError. > > > > One tricky case is when piping data into the script and reading it from > > > standard input. By default, standard input will be opened in text mode, > > > so we need to reopen it in binary mode. > > > > > > Signed-off-by: Thierry Reding <treding@nvidia.com> > > > --- > > > scripts/spdxcheck.py | 6 ++++-- > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py > > > index 5056fb3b897d..e559c6294c39 100755 > > > --- a/scripts/spdxcheck.py > > > +++ b/scripts/spdxcheck.py > > > @@ -168,6 +168,7 @@ class id_parser(object): > > > self.curline = 0 > > > try: > > > for line in fd: > > > + line = line.decode(locale.getpreferredencoding(False), errors='ignore') > > > self.curline += 1 > > > if self.curline > maxlines: > > > break > > > @@ -249,12 +250,13 @@ if __name__ == '__main__': > > > > > > try: > > > if len(args.path) and args.path[0] == '-': > > > - parser.parse_lines(sys.stdin, args.maxlines, '-') > > > + stdin = os.fdopen(sys.stdin.fileno(), 'rb') > > > + parser.parse_lines(stdin, args.maxlines, '-') > > > else: > > > if args.path: > > > for p in args.path: > > > if os.path.isfile(p): > > > - parser.parse_lines(open(p), args.maxlines, p) > > > + parser.parse_lines(open(p, 'rb'), args.maxlines, p) > > > elif os.path.isdir(p): > > > scan_git_subtree(repo.head.reference.commit.tree, p) > > > else: > > > -- > > > 2.19.1 > > > > > > > It might be worth noting this fixes commit 6f4d29df66ac > > ("scripts/spdxcheck.py: make python3 compliant") and also Cc this for > > stable since 6f4d29df66ac got backported to v4.19. While that commit > > did indeed make the script work with Python 3 for piping data, it broke > > Python 2 and made its way to stable. > > It didn't break for me. Can you provide details about how and when it > broke for you? I was wrong about it being Python 2 that broke, sorry about that. 6f4d29df66ac broke Python 3 when you run it against a sub-tree because scan_git_tree() opens the files in binary mode, but then find is run with a text string: $ python3 scripts/spdxcheck.py net/ FAIL: argument should be integer or bytes-like object, not 'str' Traceback (most recent call last): File "scripts/spdxcheck.py", line 259, in <module> scan_git_subtree(repo.head.reference.commit.tree, p) File "scripts/spdxcheck.py", line 211, in scan_git_subtree scan_git_tree(tree) File "scripts/spdxcheck.py", line 206, in scan_git_tree parser.parse_lines(fd, args.maxlines, el.path) File "scripts/spdxcheck.py", line 175, in parse_lines if line.find("SPDX-License-Identifier:") < 0: TypeError: argument should be integer or bytes-like object, not 'str' The reason I opened things in binary mode when I started adding Python 3 support was because not all files were valid UTF-8 (and some were binary) so I decoded the text line-by-line and ignored any decoding errors for simplicity's sake. Regards, Jeremy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode 2018-12-13 15:10 ` Jeremy Cline @ 2018-12-13 19:40 ` Uwe Kleine-König 0 siblings, 0 replies; 8+ messages in thread From: Uwe Kleine-König @ 2018-12-13 19:40 UTC (permalink / raw) To: Jeremy Cline Cc: Thierry Reding, Andrew Morton, Thomas Gleixner, Jonathan Corbet, Joe Perches, linux-kernel On Thu, Dec 13, 2018 at 10:10:52AM -0500, Jeremy Cline wrote: > On Thu, Dec 13, 2018 at 08:37:08AM +0100, Uwe Kleine-König wrote: > > It didn't break for me. Can you provide details about how and when it > > broke for you? > > I was wrong about it being Python 2 that broke, sorry about that. > 6f4d29df66ac broke Python 3 when you run it against a sub-tree because > scan_git_tree() opens the files in binary mode, but then find is run > with a text string: > > $ python3 scripts/spdxcheck.py net/ > FAIL: argument should be integer or bytes-like object, not 'str' > Traceback (most recent call last): > File "scripts/spdxcheck.py", line 259, in <module> > scan_git_subtree(repo.head.reference.commit.tree, p) > File "scripts/spdxcheck.py", line 211, in scan_git_subtree > scan_git_tree(tree) > File "scripts/spdxcheck.py", line 206, in scan_git_tree > parser.parse_lines(fd, args.maxlines, el.path) > File "scripts/spdxcheck.py", line 175, in parse_lines > if line.find("SPDX-License-Identifier:") < 0: > TypeError: argument should be integer or bytes-like object, not 'str' > > The reason I opened things in binary mode when I started adding Python 3 > support was because not all files were valid UTF-8 (and some were > binary) so I decoded the text line-by-line and ignored any decoding > errors for simplicity's sake. OK I understand. The problem is that there are inconsistencies in handling files as binaries or not that already existed before 6f4d29df66ac. Different code paths result in a different type for line depending on how fd was opened. I fixed the cases where fd was opened as text file and broke the cases where it was opened as binary. So changing this to consistently using binary mode (as the patch by Thierry does) seems the right thing to do. Thanks Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | http://www.pengutronix.de/ | ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-12-13 19:40 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-12-12 13:12 [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Thierry Reding 2018-12-12 13:12 ` [PATCH 2/2] scripts: Add spdxcheck.py self test Thierry Reding 2018-12-12 18:14 ` [PATCH 1/2] scripts/spdxcheck.py: Always open files in binary mode Jeremy Cline 2018-12-13 2:51 ` Joe Perches 2018-12-13 16:14 ` Thierry Reding 2018-12-13 7:37 ` Uwe Kleine-König 2018-12-13 15:10 ` Jeremy Cline 2018-12-13 19:40 ` Uwe Kleine-König
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox