Buildroot Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Yann E. MORIN <yann.morin.1998@free.fr>
To: buildroot@busybox.net
Subject: [Buildroot] Unicode problem with check-uniq-files
Date: Thu, 22 Mar 2018 21:41:45 +0100	[thread overview]
Message-ID: <20180322204145.GB4580@scaer> (raw)
In-Reply-To: <f965a6fb-db48-b39c-6f49-46a025fc63ad@jcz.nl>

Jaap, All,

Please, keep the list in Cc next time...

On 2018-03-22 11:56 +0100, Jaap Crezee spake thusly:
> On 03/22/18 11:43, Jaap Crezee wrote:
> > ./support/scripts/check-uniq-files -t target /data/work/jcz/git/jidiot/clients/innr/buildroot_development/output/build/packages-file-list.txt
> > Traceback (most recent call last):
> >   File "./support/scripts/check-uniq-files", line 42, in <module>
> >     sys.exit(main())
> >   File "./support/scripts/check-uniq-files", line 31, in main
> >     for row in r:
> >   File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> >     return codecs.ascii_decode(input, self.errors)[0]
> > UnicodeDecodeError: 'ascii' codec can't decode byte
> 
> Attached patch is working for me. If you agree with it, you can apply it.
> If you like I can ack. If you do not agree with this patch, what do you suggest?

> diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
> index be808cce03..82b0af24ba 100755
> --- a/support/scripts/check-uniq-files
> +++ b/support/scripts/check-uniq-files
> @@ -26,7 +26,7 @@ def main():
>          return False
>  
>      file_to_pkg = defaultdict(list)
> -    with open(args.packages_file_list[0], 'r') as pkg_file_list:
> +    with open(args.packages_file_list[0], 'r', encoding="utf-8") as pkg_file_list:
>          r = csv.reader(pkg_file_list, delimiter=',')
>          for row in r:
>              pkg = row[0]

I'll be testing that, but it has to work in quite a few situations:

  - python 2.6, python 2.7, python 3.x

  - current locale is UTF-8 (is it LANG, or any of the other LC_* ones?)
    or it is not an UTF-8 locale.

However, we already discussed this with Thomas on IRC the other day, and
nothing guarantees that filenames are stored as UTF-8 streams on disk.

Since packages-file-list.txt only contains whatever 'find' will put in
there, and that 'find' will only put whatever it sees on-disk, its
encoding is definitely unpredictable, probably depending on the user's
configuration.

So, even if UTF-8 is the prevalent encoding, nothing guarantees that it
is the only one we'd ever see, AFAIU...

Which means that your solution is probably just only a workaround that
happens to work for you and a lot of other situations, but is not the
correct solution.

I've been hacking that check-uniq-file script for two evenings now, and
I still don't see a good solution that makes it work in both python2 and
python3, with an UTF-8 locale or not...

I was thinking that maybe we could make it a python2 (not python)
script, but then some distros are switching to a python3-only setup
now, so that would break on those distros... Do you use such a distro,
by chance? Which one?

Anyway, more testing to be done here, thanks for the suggestion. I'll
report back later...

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

  parent reply	other threads:[~2018-03-22 20:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-19 20:45 [Buildroot] Unicode problem with check-uniq-files Jaap Crezee
2018-03-19 21:32 ` Yann E. MORIN
2018-03-20  7:48   ` Jaap Crezee
2018-03-21 21:44     ` Yann E. MORIN
2018-03-22 10:43       ` Jaap Crezee
2018-03-22 20:25         ` Yann E. MORIN
     [not found]         ` <f965a6fb-db48-b39c-6f49-46a025fc63ad@jcz.nl>
2018-03-22 20:41           ` Yann E. MORIN [this message]
2018-03-22 20:46             ` Yann E. MORIN
2018-03-22 21:12             ` Jaap Crezee
2018-03-23 21:23             ` Arnout Vandecappelle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180322204145.GB4580@scaer \
    --to=yann.morin.1998@free.fr \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox