From: Yann E. MORIN <yann.morin.1998@free.fr>
To: buildroot@busybox.net
Subject: [Buildroot] Unicode problem with check-uniq-files
Date: Thu, 22 Mar 2018 21:41:45 +0100 [thread overview]
Message-ID: <20180322204145.GB4580@scaer> (raw)
In-Reply-To: <f965a6fb-db48-b39c-6f49-46a025fc63ad@jcz.nl>
Jaap, All,
Please, keep the list in Cc next time...
On 2018-03-22 11:56 +0100, Jaap Crezee spake thusly:
> On 03/22/18 11:43, Jaap Crezee wrote:
> > ./support/scripts/check-uniq-files -t target /data/work/jcz/git/jidiot/clients/innr/buildroot_development/output/build/packages-file-list.txt
> > Traceback (most recent call last):
> > File "./support/scripts/check-uniq-files", line 42, in <module>
> > sys.exit(main())
> > File "./support/scripts/check-uniq-files", line 31, in main
> > for row in r:
> > File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> > return codecs.ascii_decode(input, self.errors)[0]
> > UnicodeDecodeError: 'ascii' codec can't decode byte
>
> Attached patch is working for me. If you agree with it, you can apply it.
> If you like I can ack. If you do not agree with this patch, what do you suggest?
> diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
> index be808cce03..82b0af24ba 100755
> --- a/support/scripts/check-uniq-files
> +++ b/support/scripts/check-uniq-files
> @@ -26,7 +26,7 @@ def main():
> return False
>
> file_to_pkg = defaultdict(list)
> - with open(args.packages_file_list[0], 'r') as pkg_file_list:
> + with open(args.packages_file_list[0], 'r', encoding="utf-8") as pkg_file_list:
> r = csv.reader(pkg_file_list, delimiter=',')
> for row in r:
> pkg = row[0]
I'll be testing that, but it has to work in quite a few situations:
- python 2.6, python 2.7, python 3.x
- current locale is UTF-8 (is it LANG, or any of the other LC_* ones?)
or it is not an UTF-8 locale.
However, we already discussed this with Thomas on IRC the other day, and
nothing guarantees that filenames are stored as UTF-8 streams on disk.
Since packages-file-list.txt only contains whatever 'find' will put in
there, and that 'find' will only put whatever it sees on-disk, its
encoding is definitely unpredictable, probably depending on the user's
configuration.
So, even if UTF-8 is the prevalent encoding, nothing guarantees that it
is the only one we'd ever see, AFAIU...
Which means that your solution is probably just only a workaround that
happens to work for you and a lot of other situations, but is not the
correct solution.
I've been hacking that check-uniq-file script for two evenings now, and
I still don't see a good solution that makes it work in both python2 and
python3, with an UTF-8 locale or not...
I was thinking that maybe we could make it a python2 (not python)
script, but then some distros are switching to a python3-only setup
now, so that would break on those distros... Do you use such a distro,
by chance? Which one?
Anyway, more testing to be done here, thanks for the suggestion. I'll
report back later...
Regards,
Yann E. MORIN.
--
.-----------------.--------------------.------------------.--------------------.
| Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ |
| +33 223 225 172 `------------.-------: X AGAINST | \e/ There is no |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. |
'------------------------------^-------^------------------^--------------------'
next prev parent reply other threads:[~2018-03-22 20:41 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-19 20:45 [Buildroot] Unicode problem with check-uniq-files Jaap Crezee
2018-03-19 21:32 ` Yann E. MORIN
2018-03-20 7:48 ` Jaap Crezee
2018-03-21 21:44 ` Yann E. MORIN
2018-03-22 10:43 ` Jaap Crezee
2018-03-22 20:25 ` Yann E. MORIN
[not found] ` <f965a6fb-db48-b39c-6f49-46a025fc63ad@jcz.nl>
2018-03-22 20:41 ` Yann E. MORIN [this message]
2018-03-22 20:46 ` Yann E. MORIN
2018-03-22 21:12 ` Jaap Crezee
2018-03-23 21:23 ` Arnout Vandecappelle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180322204145.GB4580@scaer \
--to=yann.morin.1998@free.fr \
--cc=buildroot@busybox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.