From: Yann E. MORIN <yann.morin.1998@free.fr>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files
Date: Tue, 8 Jan 2019 17:29:32 +0100 [thread overview]
Message-ID: <20190108162932.GJ19623@scaer> (raw)
In-Reply-To: <CAAXf6LUTE+HoEqZPAvSLgo4MEUjAX8TJPWV3Nku5+CRbS+Ookg@mail.gmail.com>
Thomas DS, All,
On 2019-01-08 14:35 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
[--SNIP--]
> > Furthermore, that format is not easily extensible.
> >
> > So, introduce a parser, in python, that abstracts the format of these
> > files, and returns a dictionaries. Using dictionaries makes it easy for
> returns a dictionary.
Actually, a call to the function will return a list of dictionarries,
one for each file in the list. They are yielded, so returned one by one
to iterate over easily, but many dictionaries are returned.
So: s/a /an iterator to a list of /
(I believe this to be an iterator, right?)
> > +def parse_pkg_file_list(path):
> > + with open(path, 'rb') as f:
>
> Why exactly do you open as binary?
Because it contains filenames, and filenames are a sequence of bytes,
not encioded in any way. Filenames can contain 0xbd (? in iso8859-15)
or it may contain 0x6f65 which is U+0153 encoded in UTF-8. It may even
contain both. It is not a hypotetical situation, as already encountered
it more than once.
There is no way we can know the encoding of a filename, so we treat them
for what they are: sequence of bytes.
> IIRC this will cause the need for decoding the strings on Python 3,
> which you don't seem to do. This was also the reason why the orginal
> check-uniq-files needed 'b' markers in front of some strings like
> b','.
Exactly, and hence the reason that check-uniq-files will try to decide
those strings.
There might be a gap in the transition, though, as size-stat may need to
represent those strings when generating the graphs, or generating the
CVS files... Damned...
> > + for rec in f.readlines():
> > + l = rec.split(',0')
>
> This looks wrong to me, there is no text ',0' in the input.
> I think you meant rec.split(',', 1), no, like in the original code?
Yeah, I borked a rebase at some point...
> > + d = {
> > + 'file': l[0],
> > + 'pkg': l[1],
>
> This seems also wrong, 0 and 1 are swapped.
Yeah, I borked a rebase at some point.
> Example input is:
>
> busybox,./usr/bin/eject
> busybox,./usr/bin/basename
> busybox,./usr/bin/hexedit
> busybox,./usr/bin/readlink
> busybox,./usr/bin/cmp
>
> so I would expect 'pkg' to be l[0] and 'file' to be l[1].
>
> Note that 'l' could perhaps become a more meaningful variable name. I
> also think that one of the python rules is that variable names should
> be minimum 3 characters.
So, flake8 does whine about 'l' but not about 'd'. And I borked a rebase
at some point, where I renamed 'l' and it ended up in a later commit.
I'll fix that here.
> > - with open(args.packages_file_list[0], 'rb') as pkg_file_list:
> > - for line in pkg_file_list.readlines():
> > - pkg, _, file = line.rstrip(b'\n').partition(b',')
> Note that the rstrip of newlines is no longer present in parse_pkg_file_list.
Good catch. I'll fix.
> > + fname = os.path.join(builddir, "build", "packages-file-list.txt")
> > + for record in parse_pkg_file_list(fname):
> > + # remove the initial './' in each file path
> > + fpath = record['file'].strip()[2:]
>
> The stripping here and the rstrip in check-uniq-files could perhaps
> all be moved to parse_pkg_file_list.
So, I am not sure about this one. I'm not even sure why we don't keep
the '/' either. After all, they are target files, so they should be
representable as they appear on the target, i.e. with a leading '/'
In any case, I wanted the introduction of the parser to be a drop-in
replacement for the existing ad-hoc parsers, as much as possible.
The seamantic of stripping the leading './' can be commonalised (or
fixed, or dropped) in a later patch, when this series is already big
enough as it is, and already touching (pun!) touchy (re-pun!) topics.
Regards,
Yann E. MORIN.
> > + fullpath = os.path.join(builddir, "target", fpath)
> > + add_file(filesdict, fpath, fullpath, record['pkg'])
> > return filesdict
> >
>
> /Thomas
--
.-----------------.--------------------.------------------.--------------------.
| Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ |
| +33 223 225 172 `------------.-------: X AGAINST | \e/ There is no |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. |
'------------------------------^-------^------------------^--------------------'
next prev parent reply other threads:[~2019-01-08 16:29 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 01/19] infra/pkg-generic: display MESSAGE before running PRE_HOOKS Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 02/19] infra/pkg-generic: create $(@D) " Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 03/19] infra/pkg-generic: introduce new stampfile at the beginning of all steps Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 04/19] infra/pkg-generic: use \0 to separate .la files as they are found Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 05/19] infra/pkg-generic: tweak only .la files installed by the current package Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 06/19] infra/pkg-generic: only list " Yann E. MORIN
2019-01-08 13:07 ` Thomas De Schampheleire
2019-01-08 15:56 ` Thomas De Schampheleire
2019-01-08 19:51 ` Yann E. MORIN
2019-01-08 16:08 ` Yann E. MORIN
2019-01-08 16:55 ` Thomas De Schampheleire
2019-01-08 20:02 ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 07/19] infra/pkg-generic: offload same-package filtering to check-uniq-file Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible Yann E. MORIN
2019-02-07 23:40 ` Arnout Vandecappelle
2019-02-08 17:25 ` Yann E. MORIN
2019-02-08 20:42 ` Arnout Vandecappelle
2019-02-08 21:22 ` Yann E. MORIN
2019-02-08 22:02 ` Arnout Vandecappelle
2019-01-07 22:05 ` [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files Yann E. MORIN
2019-01-08 13:35 ` Thomas De Schampheleire
2019-01-08 16:29 ` Yann E. MORIN [this message]
2019-01-08 17:30 ` Thomas De Schampheleire
2019-01-08 20:52 ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python Yann E. MORIN
2019-01-08 14:56 ` Thomas De Schampheleire
2019-01-08 16:37 ` Yann E. MORIN
2019-01-08 17:22 ` Thomas De Schampheleire
2019-01-08 20:33 ` Yann E. MORIN
2019-01-08 20:46 ` Thomas De Schampheleire
2019-01-08 21:16 ` Yann E. MORIN
2019-01-09 14:47 ` Thomas De Schampheleire
2019-01-07 22:05 ` [Buildroot] [PATCH 11/19] support: introduce new format for packages-file-list files Yann E. MORIN
2019-01-08 15:07 ` Thomas De Schampheleire
2019-01-08 19:27 ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 12/19] infra/pkg-generic: store md5 of just-installed files Yann E. MORIN
2019-01-08 15:13 ` Thomas De Schampheleire
2019-01-08 19:31 ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 13/19] support/check-uniq-file: invert condition logic Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 14/19] support/check-uniq-files: don't report files of the same content Yann E. MORIN
2019-01-08 15:22 ` Thomas De Schampheleire
2019-01-07 22:05 ` [Buildroot] [PATCH 15/19] support/check-uniq-files: use argparse to enfore required options Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 16/19] core: check unique files in the corresponding finalize step Yann E. MORIN
2019-01-08 15:24 ` Thomas De Schampheleire
2019-01-08 19:36 ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 17/19] core: check for unique target files after all our cleanups Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 18/19] core: ignore non-unique files that have disapeared Yann E. MORIN
2019-01-08 15:29 ` Thomas De Schampheleire
2019-01-08 19:44 ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 19/19] core: add optional failure when 2+ packages touch the same file Yann E. MORIN
2019-01-08 12:51 ` [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Thomas De Schampheleire
2019-01-08 15:53 ` Thomas De Schampheleire
2019-01-08 21:30 ` Yann E. MORIN
2019-01-09 13:39 ` Thomas De Schampheleire
2019-01-09 18:10 ` Yann E. MORIN
2019-01-10 20:34 ` Thomas De Schampheleire
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190108162932.GJ19623@scaer \
--to=yann.morin.1998@free.fr \
--cc=buildroot@busybox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox