All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yann E. MORIN <yann.morin.1998@free.fr>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files
Date: Tue, 8 Jan 2019 17:29:32 +0100	[thread overview]
Message-ID: <20190108162932.GJ19623@scaer> (raw)
In-Reply-To: <CAAXf6LUTE+HoEqZPAvSLgo4MEUjAX8TJPWV3Nku5+CRbS+Ookg@mail.gmail.com>

Thomas DS, All,

On 2019-01-08 14:35 +0100, Thomas De Schampheleire spake thusly:
> El lun., 7 ene. 2019 a las 23:06, Yann E. MORIN
> (<yann.morin.1998@free.fr>) escribi?:
[--SNIP--]
> > Furthermore, that format is not easily extensible.
> >
> > So, introduce a parser, in python, that abstracts the format of these
> > files, and returns a dictionaries. Using dictionaries makes it easy for
> returns a dictionary.

Actually, a call to the function will return a list of dictionarries,
one for each file in the list. They are yielded, so returned one by one
to iterate over easily, but many dictionaries are returned.

So:   s/a /an iterator to a list of /

(I believe this to be an iterator, right?)

> > +def parse_pkg_file_list(path):
> > +    with open(path, 'rb') as f:
> 
> Why exactly do you open as binary?

Because it contains filenames, and filenames are a sequence of bytes,
not encioded in any way. Filenames can contain 0xbd (? in iso8859-15)
or it may contain 0x6f65 which is U+0153 encoded in UTF-8. It may even
contain both. It is not a hypotetical situation, as already encountered
it more than once.

There is no way we can know the encoding of a filename, so we treat them
for what they are: sequence of bytes.

> IIRC this will cause the need for decoding the strings on Python 3,
> which you don't seem to do. This was also the reason why the orginal
> check-uniq-files needed 'b' markers in front of some strings like
> b','.

Exactly, and hence the reason that check-uniq-files will try to decide
those strings.

There might be a gap in the transition, though, as size-stat may need to
represent those strings when generating the graphs, or generating the
CVS files... Damned...

> > +        for rec in f.readlines():
> > +            l = rec.split(',0')
> 
> This looks wrong to me, there is no text ',0' in the input.
> I think you meant rec.split(',', 1), no, like in the original code?

Yeah, I borked a rebase at some point...

> > +            d = {
> > +                  'file': l[0],
> > +                  'pkg':  l[1],
> 
> This seems also wrong, 0 and 1 are swapped.

Yeah, I borked a rebase at some point.

> Example input is:
> 
> busybox,./usr/bin/eject
> busybox,./usr/bin/basename
> busybox,./usr/bin/hexedit
> busybox,./usr/bin/readlink
> busybox,./usr/bin/cmp
> 
> so I would expect 'pkg' to be l[0] and 'file' to be l[1].
> 
> Note that 'l' could perhaps become a more meaningful variable name. I
> also think that one of the python rules is that variable names should
> be minimum 3 characters.

So, flake8 does whine about 'l' but not about 'd'. And I borked a rebase
at some point, where I renamed 'l' and it ended up in a later commit.

I'll fix that here.

> > -    with open(args.packages_file_list[0], 'rb') as pkg_file_list:
> > -        for line in pkg_file_list.readlines():
> > -            pkg, _, file = line.rstrip(b'\n').partition(b',')
> Note that the rstrip of newlines is no longer present in parse_pkg_file_list.

Good catch. I'll fix.

> > +    fname = os.path.join(builddir, "build", "packages-file-list.txt")
> > +    for record in parse_pkg_file_list(fname):
> > +        # remove the initial './' in each file path
> > +        fpath = record['file'].strip()[2:]
> 
> The stripping here and the rstrip in check-uniq-files could perhaps
> all be moved to parse_pkg_file_list.

So, I am not sure about this one. I'm not even sure why we don't keep
the '/' either. After all, they are target files, so they should be
representable as they appear on the target, i.e. with a leading '/'

In any case, I wanted the introduction of the parser to be a drop-in
replacement for the existing ad-hoc parsers, as much as possible.

The seamantic of stripping the leading './' can be commonalised (or
fixed, or dropped) in a later patch, when this series is already big
enough as it is, and already touching (pun!) touchy (re-pun!) topics.

Regards,
Yann E. MORIN.

> > +        fullpath = os.path.join(builddir, "target", fpath)
> > +        add_file(filesdict, fpath, fullpath, record['pkg'])
> >      return filesdict
> >
> 
> /Thomas

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

  reply	other threads:[~2019-01-08 16:29 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-07 22:05 [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 01/19] infra/pkg-generic: display MESSAGE before running PRE_HOOKS Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 02/19] infra/pkg-generic: create $(@D) " Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 03/19] infra/pkg-generic: introduce new stampfile at the beginning of all steps Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 04/19] infra/pkg-generic: use \0 to separate .la files as they are found Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 05/19] infra/pkg-generic: tweak only .la files installed by the current package Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 06/19] infra/pkg-generic: only list " Yann E. MORIN
2019-01-08 13:07   ` Thomas De Schampheleire
2019-01-08 15:56     ` Thomas De Schampheleire
2019-01-08 19:51       ` Yann E. MORIN
2019-01-08 16:08     ` Yann E. MORIN
2019-01-08 16:55       ` Thomas De Schampheleire
2019-01-08 20:02         ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 07/19] infra/pkg-generic: offload same-package filtering to check-uniq-file Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 08/19] support/check-uniq-files: decode as many strings as possible Yann E. MORIN
2019-02-07 23:40   ` Arnout Vandecappelle
2019-02-08 17:25     ` Yann E. MORIN
2019-02-08 20:42       ` Arnout Vandecappelle
2019-02-08 21:22         ` Yann E. MORIN
2019-02-08 22:02           ` Arnout Vandecappelle
2019-01-07 22:05 ` [Buildroot] [PATCH 09/19] support: add parser in python for packages-file-list files Yann E. MORIN
2019-01-08 13:35   ` Thomas De Schampheleire
2019-01-08 16:29     ` Yann E. MORIN [this message]
2019-01-08 17:30       ` Thomas De Schampheleire
2019-01-08 20:52         ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 10/19] support: rewrite check-bin-arch in python Yann E. MORIN
2019-01-08 14:56   ` Thomas De Schampheleire
2019-01-08 16:37     ` Yann E. MORIN
2019-01-08 17:22       ` Thomas De Schampheleire
2019-01-08 20:33         ` Yann E. MORIN
2019-01-08 20:46           ` Thomas De Schampheleire
2019-01-08 21:16             ` Yann E. MORIN
2019-01-09 14:47               ` Thomas De Schampheleire
2019-01-07 22:05 ` [Buildroot] [PATCH 11/19] support: introduce new format for packages-file-list files Yann E. MORIN
2019-01-08 15:07   ` Thomas De Schampheleire
2019-01-08 19:27     ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 12/19] infra/pkg-generic: store md5 of just-installed files Yann E. MORIN
2019-01-08 15:13   ` Thomas De Schampheleire
2019-01-08 19:31     ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 13/19] support/check-uniq-file: invert condition logic Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 14/19] support/check-uniq-files: don't report files of the same content Yann E. MORIN
2019-01-08 15:22   ` Thomas De Schampheleire
2019-01-07 22:05 ` [Buildroot] [PATCH 15/19] support/check-uniq-files: use argparse to enfore required options Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 16/19] core: check unique files in the corresponding finalize step Yann E. MORIN
2019-01-08 15:24   ` Thomas De Schampheleire
2019-01-08 19:36     ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 17/19] core: check for unique target files after all our cleanups Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 18/19] core: ignore non-unique files that have disapeared Yann E. MORIN
2019-01-08 15:29   ` Thomas De Schampheleire
2019-01-08 19:44     ` Yann E. MORIN
2019-01-07 22:05 ` [Buildroot] [PATCH 19/19] core: add optional failure when 2+ packages touch the same file Yann E. MORIN
2019-01-08 12:51 ` [Buildroot] [PATCH 00/19] support: limit install-time instrumentation to current package's files (branch yem/files-list-2) Thomas De Schampheleire
2019-01-08 15:53   ` Thomas De Schampheleire
2019-01-08 21:30     ` Yann E. MORIN
2019-01-09 13:39       ` Thomas De Schampheleire
2019-01-09 18:10         ` Yann E. MORIN
2019-01-10 20:34           ` Thomas De Schampheleire

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190108162932.GJ19623@scaer \
    --to=yann.morin.1998@free.fr \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.