From: Bodo Eggert <harvested.in.lkml@7eggert.dyndns.org>
To: "H. Peter Anvin" <hpa@zytor.com>,
"Martin v. Löwis" <martin@v.loewis.de>,
linux-kernel@vger.kernel.org
Subject: Re: [Patch] Support UTF-8 scripts
Date: Fri, 16 Sep 2005 20:02:36 +0200 [thread overview]
Message-ID: <E1EGKXl-0001Sn-GA@be1.lrz> (raw)
In-Reply-To: 4N7AS-67L-3@gated-at.bofh.it
Bernd Petrovitsch <bernd@firmix.at> wrote:
> On Thu, 2005-09-15 at 20:39 +0200, "Martin v. Löwis" wrote:
>> H. Peter Anvin wrote:
>> > In Unix, it's a hideously bad idea. The reason is that Unix inherently
>> > assumes that text streams can be merged, split, and modified. In other
>> > words, unless you can guarantee that EVERY program can handle BOM
>> > EVERYWHERE, it's broken.
You can't sort /bin/ls into /tmp/ls and expect /tmp/ls to be meaningfull,
but /bin/ls works as expected. You can't usurally concat perl scripts and
shell scripts either, but both kinds of script run quite well.
And if you do "cat /bin/cat /bin/cp > /bin/catcp", what's "catcp foo bar"
supposed to do? First output foo and bar to stdout, then copy foo to bar?
Is execve() broken if it doesn't do what I described? Is the ELF header
broken because it's not recogmized EVERYWHERE? I don't think so.
>> This argument is bogus. We are talking about scripts here, which cannot
>> be merged, split, and modified. You don't cat(1) or sort(1) them - it's
>
> Sure they can since they are plain text files.
> How do you think one merges scripts?
> Just `cat`ing them all into one new file and edit that new file is much
> faster and simpler than to open an empty new file with your editor, then
> you open all the other scripts in your editor and copy them by hand.
What's supposed to happen if you concatenate a script from your french
user and from your russian user, both using localized text, into one file?
Unless you can guarantee every editor to correctly handle this case, all
usage of 8-bit-characters should be disabled - NOT!
If you concatenate two plain text files, you will use cat.
If you concatenate two pnm image files, you will use pnmcat.
If you concatenate two utf-8 files, you will use utf8cat.
If you concatenate two binaries, you will shoot your feet.
That's easy, isn't it?
BTW: I think decent utf-8 capable programs SHOULD ignore extra BOM markers.
> And you (or at least I) do `grep`/`egrep`/`fgrep`, `wc` them.
You can *grep utf-8 scripts, but you can't *grep binaries. Shouldn't
this be fixed by implementing an in-kernel ASCII assembler and convert
all binaries to assembler text?
> And
> probably with several other tools too - think of `find <dir> -type f
> -print0 | xargs -0r <cmd>`.
utf-8 filenames will work correctly (unless used as an extended BASIC
script with non-ASCII variable names, but that would be insane).
>> just pointless to do that. You create them with text editors, and those
>> can handle the UTF-8 signature.
>
> It is not uncommon to create scripts and the like with other programs,
> other scripts, what-else.
It's not uncommon to create binaries using other programs. So what?
> Apart from the fact the a "script" is merely a plain text file with the
> eXecutable bit set.
And an utf-8 script is a utf-8 encoded text file with it's executable bit
set.
> And that is the only difference, so you have to at
> least (all instances of) `chmod` to insert and remove the BOM.
[...]
In order to make it harder for the interpreter to correctly detect utf-8?
You can have DOS executables run in dosboxes, windows applications run
in windows, java archives run in java, but utf-8 scripts should be
mangled in order to work "correctly", and mangled back in order to be
editable? *That*'s insane!
Just make execve ignore the BOM marker before "#!" as the patch does, and
you're done. The rest is somebody else's not-a-problem.
BTW2: However, I don't like the patch.
I'd first check for a utf-8 signature, and if it's found, adjust the
buffer offset by 3. Then I'd run the old code checking for the sh_bang.
OTOH, I just read the patch and not the .c file, maybe (unlikely) my idea
wouldn't work correctly.
--
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.
next parent reply other threads:[~2005-09-16 18:02 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4N6EL-4Hq-3@gated-at.bofh.it>
[not found] ` <4N6EL-4Hq-5@gated-at.bofh.it>
[not found] ` <4N6EK-4Hq-1@gated-at.bofh.it>
[not found] ` <4N6EX-4Hq-27@gated-at.bofh.it>
[not found] ` <4N6Ox-4Ts-33@gated-at.bofh.it>
[not found] ` <4N7AS-67L-3@gated-at.bofh.it>
2005-09-16 18:02 ` Bodo Eggert [this message]
2005-09-16 18:09 ` [Patch] Support UTF-8 scripts H. Peter Anvin
2005-09-16 18:57 ` Bodo Eggert
2005-09-16 19:08 ` Martin Mares
2005-09-16 19:25 ` H. Peter Anvin
2005-09-16 19:57 ` Horst von Brand
[not found] ` <200509170028.59973.dhazelton@enter.net>
2005-09-17 6:28 ` "Martin v. Löwis"
2005-09-17 22:31 ` D. Hazelton
2005-09-18 3:45 ` Kyle Moffett
2005-09-19 0:14 ` D. Hazelton
2005-09-18 6:58 ` "Martin v. Löwis"
2005-09-19 0:31 ` D. Hazelton
2005-09-17 17:16 ` Bodo Eggert
[not found] <4NVHm-3yE-13@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-15@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-17@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-19@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-21@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-23@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-25@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-27@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-29@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-31@gated-at.bofh.it>
[not found] ` <4NVHn-3yE-33@gated-at.bofh.it>
[not found] ` <4NVHn-3yE-35@gated-at.bofh.it>
[not found] ` <4NVHn-3yE-37@gated-at.bofh.it>
[not found] ` <4NVHn-3yE-39@gated-at.bofh.it>
[not found] ` <4Od1x-3e3-5@gated-at.bofh.it>
[not found] ` <4Od1x-3e3-7@gated-at.bofh.it>
[not found] ` <4Od1w-3e3-3@gated-at.bofh.it>
[not found] ` <4OfZo-7AG-21@gated-at.bofh.it>
2005-09-19 5:11 ` "Martin v. Löwis"
[not found] <4NsP0-3YF-11@gated-at.bofh.it>
[not found] ` <4NsP0-3YF-13@gated-at.bofh.it>
[not found] ` <4NsP0-3YF-15@gated-at.bofh.it>
[not found] ` <4NsP0-3YF-17@gated-at.bofh.it>
[not found] ` <4NsP1-3YF-19@gated-at.bofh.it>
[not found] ` <4NsP1-3YF-21@gated-at.bofh.it>
[not found] ` <4NsOZ-3YF-9@gated-at.bofh.it>
[not found] ` <4NsYH-4bv-27@gated-at.bofh.it>
[not found] ` <4NtBr-4WU-3@gated-at.bofh.it>
[not found] ` <4NtL0-5lQ-13@gated-at.bofh.it>
2005-09-16 20:34 ` "Martin v. Löwis"
2005-09-17 12:01 ` Martin Mares
2005-09-17 12:25 ` "Martin v. Löwis"
2005-09-17 12:28 ` Martin Mares
2005-09-17 12:53 ` "Martin v. Löwis"
2005-09-17 13:05 ` Martin Mares
2005-09-17 13:33 ` "Martin v. Löwis"
2005-09-19 7:08 ` Pavel Machek
2005-09-19 7:18 ` "Martin v. Löwis"
2005-09-19 7:24 ` Pavel Machek
2005-09-19 7:46 ` "Martin v. Löwis"
2005-09-19 7:50 ` Pavel Machek
2005-09-19 10:48 ` Alan Cox
2005-09-19 23:49 ` Horst von Brand
[not found] ` <4Nu4p-5Js-3@gated-at.bofh.it>
2005-09-16 20:41 ` "Martin v. Löwis"
2005-09-16 22:08 ` H. Peter Anvin
2005-09-17 6:05 ` "Martin v. Löwis"
2005-09-16 22:45 ` Bernd Petrovitsch
2005-09-17 6:20 ` "Martin v. Löwis"
2005-09-17 22:28 ` Bernd Petrovitsch
2005-09-18 7:23 ` "Martin v. Löwis"
2005-09-18 14:50 ` Bernd Petrovitsch
2005-09-17 6:45 ` "Martin v. Löwis"
[not found] ` <4NXfZ-5P0-1@gated-at.bofh.it>
[not found] ` <4NYlM-7i0-5@gated-at.bofh.it>
[not found] ` <4Olip-6HH-13@gated-at.bofh.it>
2005-09-19 4:41 ` "Martin v. Löwis"
[not found] <4Nvab-7o5-11@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-13@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-15@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-17@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-19@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-21@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-23@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-25@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-27@gated-at.bofh.it>
[not found] ` <4NvjM-7CU-7@gated-at.bofh.it>
[not found] ` <4NvjM-7CU-5@gated-at.bofh.it>
[not found] ` <4NxbR-20S-1@gated-at.bofh.it>
[not found] ` <4NEn7-3M5-7@gated-at.bofh.it>
[not found] ` <4NTvO-yJ-13@gated-at.bofh.it>
2005-09-18 0:53 ` Bodo Eggert
2005-09-18 16:53 ` Bernd Petrovitsch
[not found] ` <4O1MJ-3Hf-5@gated-at.bofh.it>
[not found] ` <4O8Oh-5jp-7@gated-at.bofh.it>
2005-09-18 19:23 ` Bodo Eggert
2005-09-18 21:03 ` Bernd Petrovitsch
2005-09-19 19:37 ` Bodo Eggert
2005-09-18 22:29 ` Valdis.Kletnieks
2005-09-19 6:03 ` H. Peter Anvin
2005-09-19 4:54 ` "Martin v. Löwis"
2005-09-19 8:26 ` Bernd Petrovitsch
2005-09-19 9:00 ` Valdis.Kletnieks
2005-09-19 9:41 ` Bernd Petrovitsch
2005-09-19 21:40 ` "Martin v. Löwis"
[not found] <4B2ZV-2dl-7@gated-at.bofh.it>
[not found] ` <4HKbZ-Cx-37@gated-at.bofh.it>
2005-09-15 18:24 ` "Martin v. Löwis"
2005-09-15 18:25 ` H. Peter Anvin
2005-09-15 18:39 ` "Martin v. Löwis"
2005-09-15 19:20 ` H. Peter Anvin
2005-09-16 8:13 ` Bernd Petrovitsch
2005-08-13 12:07 "Martin v. Löwis"
2005-08-13 16:35 ` Stephen Pollei
2005-08-13 18:42 ` Lee Revell
2005-08-13 18:49 ` Hugo Mills
2005-08-13 18:53 ` Lee Revell
2005-08-14 0:57 ` Alan Cox
2005-08-14 1:19 ` Kyle Moffett
2005-08-14 1:40 ` Lee Revell
2005-08-14 10:40 ` Wichert Akkerman
2005-08-13 19:20 ` Lee Revell
2005-08-16 9:46 ` Jan Engelhardt
2005-08-14 0:53 ` Alan Cox
2005-08-14 4:10 ` James Cloos
2005-08-14 6:18 ` Jason L Tibbitts III
[not found] ` <feed8cdd050814125845fe4e2e@mail.gmail.com>
2005-08-14 19:59 ` Lee Revell
2005-08-14 20:13 ` Stephen Pollei
2005-08-14 20:22 ` Lee Revell
2005-08-14 22:10 ` "Martin v. Löwis"
2005-08-14 23:55 ` Alan Cox
2005-08-16 13:56 ` David Madore
[not found] ` <mailman.1124063520.13257.linux-kernel2news@redhat.com>
2005-08-16 20:17 ` Pete Zaitcev
2005-08-14 21:52 ` Kyle Moffett
2005-08-14 22:12 ` Valdis.Kletnieks
2005-08-15 8:01 ` Helge Hafting
2005-08-31 23:27 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1EGKXl-0001Sn-GA@be1.lrz \
--to=harvested.in.lkml@7eggert.dyndns.org \
--cc=7eggert@gmx.de \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=martin@v.loewis.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox