Buildroot Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Yann E. MORIN <yann.morin.1998@free.fr>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files
Date: Tue, 8 Jan 2019 17:29:32 +0100	[thread overview]
Message-ID: <20190108162932.GJ19623@scaer> (raw)
In-Reply-To: <CAAXf6LUTE+HoEqZPAvSLgo4MEUjAX8TJPWV3Nku5+CRbS+Ookg@mail.gmail.com>

Thomas DS, All,

On 2019-01-08 14:35 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
[--SNIP--]
> > Furthermore, that format is not easily extensible.
> >
> > So, introduce a parser, in python, that abstracts the format of these
> > files, and returns a dictionaries. Using dictionaries makes it easy for
> returns a dictionary.

Actually, a call to the function will return a list of dictionarries,
one for each file in the list. They are yielded, so returned one by one
to iterate over easily, but many dictionaries are returned.

So:   s/a /an iterator to a list of /

(I believe this to be an iterator, right?)

> > +def parse_pkg_file_list(path):
> > +    with open(path, 'rb') as f:
> 
> Why exactly do you open as binary?

Because it contains filenames, and filenames are a sequence of bytes,
not encioded in any way. Filenames can contain 0xbd (? in iso8859-15)
or it may contain 0x6f65 which is U+0153 encoded in UTF-8. It may even
contain both. It is not a hypotetical situation, as already encountered
it more than once.

There is no way we can know the encoding of a filename, so we treat them
for what they are: sequence of bytes.

> IIRC this will cause the need for decoding the strings on Python 3,
> which you don't seem to do. This was also the reason why the orginal
> check-uniq-files needed 'b' markers in front of some strings like
> b','.

Exactly, and hence the reason that check-uniq-files will try to decide
those strings.

There might be a gap in the transition, though, as size-stat may need to
represent those strings when generating the graphs, or generating the
CVS files... Damned...

> > +        for rec in f.readlines():
> > +            l = rec.split(',0')
> 
> This looks wrong to me, there is no text ',0' in the input.
> I think you meant rec.split(',', 1), no, like in the original code?

Yeah, I borked a rebase at some point...

> > +            d = {
> > +                  'file': l[0],
> > +                  'pkg':  l[1],
> 
> This seems also wrong, 0 and 1 are swapped.

Yeah, I borked a rebase at some point.

> Example input is:
> 
> busybox,./usr/bin/eject
> busybox,./usr/bin/basename
> busybox,./usr/bin/hexedit
> busybox,./usr/bin/readlink
> busybox,./usr/bin/cmp
> 
> so I would expect 'pkg' to be l[0] and 'file' to be l[1].
> 
> Note that 'l' could perhaps become a more meaningful variable name. I
> also think that one of the python rules is that variable names should
> be minimum 3 characters.

So, flake8 does whine about 'l' but not about 'd'. And I borked a rebase
at some point, where I renamed 'l' and it ended up in a later commit.

I'll fix that here.

> > -    with open(args.packages_file_list[0], 'rb') as pkg_file_list:
> > -        for line in pkg_file_list.readlines():
> > -            pkg, _, file = line.rstrip(b'\n').partition(b',')
> Note that the rstrip of newlines is no longer present in parse_pkg_file_list.

Good catch. I'll fix.

> > +    fname = os.path.join(builddir, "build", "packages-file-list.txt")
> > +    for record in parse_pkg_file_list(fname):
> > +        # remove the initial './' in each file path
> > +        fpath = record['file'].strip()[2:]
> 
> The stripping here and the rstrip in check-uniq-files could perhaps
> all be moved to parse_pkg_file_list.

So, I am not sure about this one. I'm not even sure why we don't keep
the '/' either. After all, they are target files, so they should be
representable as they appear on the target, i.e. with a leading '/'

In any case, I wanted the introduction of the parser to be a drop-in
replacement for the existing ad-hoc parsers, as much as possible.

The seamantic of stripping the leading './' can be commonalised (or
fixed, or dropped) in a later patch, when this series is already big
enough as it is, and already touching (pun!) touchy (re-pun!) topics.

Regards,
Yann E. MORIN.

> > +        fullpath = os.path.join(builddir, "target", fpath)
> > +        add_file(filesdict, fpath, fullpath, record['pkg'])
> >      return filesdict
> >
> 
> /Thomas

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

  reply	other threads:[~2019-01-08 16:29 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 01/19] infra/pkg-generic: display MESSAGE before running PRE_HOOKS Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 02/19] infra/pkg-generic: create $(@D) " Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 03/19] infra/pkg-generic: introduce new stampfile at the beginning of all steps Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 04/19] infra/pkg-generic: use \0 to separate .la files as they are found Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 05/19] infra/pkg-generic: tweak only .la files installed by the current package Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 06/19] infra/pkg-generic: only list " Yann E. MORIN
2019-01-08 13:07   ` Thomas De Schampheleire
2019-01-08 15:56     ` Thomas De Schampheleire
2019-01-08 19:51       ` Yann E. MORIN
2019-01-08 16:08     ` Yann E. MORIN
2019-01-08 16:55       ` Thomas De Schampheleire
2019-01-08 20:02         ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 07/19] infra/pkg-generic: offload same-package filtering to check-uniq-file Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible Yann E. MORIN
2019-02-07 23:40   ` Arnout Vandecappelle
2019-02-08 17:25     ` Yann E. MORIN
2019-02-08 20:42       ` Arnout Vandecappelle
2019-02-08 21:22         ` Yann E. MORIN
2019-02-08 22:02           ` Arnout Vandecappelle
2019-01-07 22:05 ` [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files Yann E. MORIN
2019-01-08 13:35   ` Thomas De Schampheleire
2019-01-08 16:29     ` Yann E. MORIN [this message]
2019-01-08 17:30       ` Thomas De Schampheleire
2019-01-08 20:52         ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python Yann E. MORIN
2019-01-08 14:56   ` Thomas De Schampheleire
2019-01-08 16:37     ` Yann E. MORIN
2019-01-08 17:22       ` Thomas De Schampheleire
2019-01-08 20:33         ` Yann E. MORIN
2019-01-08 20:46           ` Thomas De Schampheleire
2019-01-08 21:16             ` Yann E. MORIN
2019-01-09 14:47               ` Thomas De Schampheleire
2019-01-07 22:05 ` [Buildroot] [PATCH 11/19] support: introduce new format for packages-file-list files Yann E. MORIN
2019-01-08 15:07   ` Thomas De Schampheleire
2019-01-08 19:27     ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 12/19] infra/pkg-generic: store md5 of just-installed files Yann E. MORIN
2019-01-08 15:13   ` Thomas De Schampheleire
2019-01-08 19:31     ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 13/19] support/check-uniq-file: invert condition logic Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 14/19] support/check-uniq-files: don't report files of the same content Yann E. MORIN
2019-01-08 15:22   ` Thomas De Schampheleire
2019-01-07 22:05 ` [Buildroot] [PATCH 15/19] support/check-uniq-files: use argparse to enfore required options Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 16/19] core: check unique files in the corresponding finalize step Yann E. MORIN
2019-01-08 15:24   ` Thomas De Schampheleire
2019-01-08 19:36     ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 17/19] core: check for unique target files after all our cleanups Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 18/19] core: ignore non-unique files that have disapeared Yann E. MORIN
2019-01-08 15:29   ` Thomas De Schampheleire
2019-01-08 19:44     ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 19/19] core: add optional failure when 2+ packages touch the same file Yann E. MORIN
2019-01-08 12:51 ` [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Thomas De Schampheleire
2019-01-08 15:53   ` Thomas De Schampheleire
2019-01-08 21:30     ` Yann E. MORIN
2019-01-09 13:39       ` Thomas De Schampheleire
2019-01-09 18:10         ` Yann E. MORIN
2019-01-10 20:34           ` Thomas De Schampheleire

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190108162932.GJ19623@scaer \
    --to=yann.morin.1998@free.fr \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox