From: "\"Martin v. Löwis\"" <martin@v.loewis.de>
To: Bernd Petrovitsch <bernd@firmix.at>
Cc: "\"Martin v. Löwis\"" <martin@v.loewis.de>,
"H. Peter Anvin" <hpa@zytor.com>,
linux-kernel@vger.kernel.org
Subject: Re: [Patch] Support UTF-8 scripts
Date: Sun, 18 Sep 2005 09:23:38 +0200 [thread overview]
Message-ID: <432D15FA.8030100@v.loewis.de> (raw)
In-Reply-To: <1126996093.3373.21.camel@gimli.at.home>
Bernd Petrovitsch wrote:
> Most of the text editors have ways to markup the source files. Not even
> the various editors are able to agreen on one method for all, so why
> could the (Linux) world agree on one for all text files?
You are ignoring the role of standardization. People invent their own
mechanism if a standard is missing (or virtually unimplementable). For
declaring encodings, there is no standard (except of iso-2022, which
is really hard to implement correctly). Therefore, editor authors
create their own standards.
Atleast Python abstained from creating yet another standard, and instead
supports both the declarations from Emacs and vim. To some degree, it
also supports notepad (namely through the UTF-8 signature).
However, people are much more likely to agree on a technology when it
is defined by a recognized standards body. This is the case for the
UTF-8 signature, which is defined by the Unicode consortium, for
precisely this purpose. Therefore, editors *will* agree on that
mechanism, while keeping their own mechanism for the more general
problem.
>>Even for the programming language, it is a pain to implement: what
>>if you have non-ASCII characters before the pragma that declares the
>>encoding? and so on.
>
>
> That's the problem of the language definers who absolutely want such
> (IMHO absolutely superflous) features.
It's not the language designers who absolutely want this feature. It's
the language users. Of course, you'ld have to be a language designer to
know that fact - language users go to the language designers asking for
the feature, not to the kernel developers.
>>Hmm. What does that have to do with the patch I'm proposing? This
>>patch does *not* interfere with all text files. It is only relevant
>>for executable files starting with the #! magic.
>
>
> It *does* interfere since scripts are also text files in every aspect.
> So every feature you want for "scripts" you also get for text files (and
> vice versa BTW).
The specific feature I get is that when I pass a file starting
with <utf8sig>#! to execve, Linux will execute the file following
the #!. In what way do I get this feature for text in general?
And if I do, why is that a problem?
> If you think "script" and "text file" are different, define both of
> them, please, otherwise a discussion is pointless.
A script file (in the context of this discussion) is a text file
that is executable (i.e. has the appropriate subset of
S_IXUSR|S_IXGRP|S_IXOTH set), starts with #!, and has the path
name of an executable file after the #!.
More generally, a script file is a text file written in a scripting
language. A scripting language is a programming language which
supports "direct" execution of source code. So in the more
general definition, a script file does not need to start with
#!; for the context of this discussion, we should restrict
attention to files actually affected by the patch.
>>This conclusion is false. Many tools that don't understand the file
>>structure still can do their job on the files. So the fact that a tool
>>does not understand the structure does not necessarily imply that
>>the tool breaks when the structure changes.
>
>
> It *may* break just because of some to-be-ignored inline marking due to
> some questionable feature.
Be more specific. For what specific kind of file will cat(1) break?
Unless cat(1) has a 2GB limitation, I very much doubt it will break
(i.e. fail to do its job, "concatenate files and print on the standard
output") for any kind of input - whether this is text files, binary
files, images, sound files, HTML files. cat always does what it is
designed to do.
> Let alone the confusion why the size of a file with `ls -l` is different
> from the size in the editor or a marker-aware `wc -c`.
This is true for any UTF-8 file, or any multibyte encoding. For any
multibyte encoding, the number of bytes in the file is different from
the number of characters. That doesn't (and shouldn't) stop people from
using multi-byte encodings.
What the editor displays as the number of "things" is up to its own.
The output of wc -c will always be the same as the one of ls -l,
as wc -c does *not* give you characters:
-c, --bytes
print the byte counts
You might have been thinking of 'wc -m'.
>>For a Python script, I don't need to guess: It will just work.
>
>
> Then write a short python script (with a "#!/usr/bin/python" line at the
> start [without parameters]) natively on a Win*-system, copy it binary
> over to an arbitrary Linux system and see what's happening.
It depends on the editor I use, of course: the kernel will consider any
CR after the n as part of the interpreter name. Not sure what this has
to do with the specific patch, though.
Regards,
Martin
next prev parent reply other threads:[~2005-09-18 7:23 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4NsP0-3YF-11@gated-at.bofh.it>
[not found] ` <4NsP0-3YF-13@gated-at.bofh.it>
[not found] ` <4NsP0-3YF-15@gated-at.bofh.it>
[not found] ` <4NsP0-3YF-17@gated-at.bofh.it>
[not found] ` <4NsP1-3YF-19@gated-at.bofh.it>
[not found] ` <4NsP1-3YF-21@gated-at.bofh.it>
[not found] ` <4NsOZ-3YF-9@gated-at.bofh.it>
[not found] ` <4NsYH-4bv-27@gated-at.bofh.it>
[not found] ` <4NtBr-4WU-3@gated-at.bofh.it>
[not found] ` <4NtL0-5lQ-13@gated-at.bofh.it>
2005-09-16 20:34 ` [Patch] Support UTF-8 scripts "Martin v. Löwis"
2005-09-17 12:01 ` Martin Mares
2005-09-17 12:25 ` "Martin v. Löwis"
2005-09-17 12:28 ` Martin Mares
2005-09-17 12:53 ` "Martin v. Löwis"
2005-09-17 13:05 ` Martin Mares
2005-09-17 13:33 ` "Martin v. Löwis"
2005-09-19 7:08 ` Pavel Machek
2005-09-19 7:18 ` "Martin v. Löwis"
2005-09-19 7:24 ` Pavel Machek
2005-09-19 7:46 ` "Martin v. Löwis"
2005-09-19 7:50 ` Pavel Machek
2005-09-19 10:48 ` Alan Cox
2005-09-19 23:49 ` Horst von Brand
[not found] ` <4Nu4p-5Js-3@gated-at.bofh.it>
2005-09-16 20:41 ` "Martin v. Löwis"
2005-09-16 22:08 ` H. Peter Anvin
2005-09-17 6:05 ` "Martin v. Löwis"
2005-09-16 22:45 ` Bernd Petrovitsch
2005-09-17 6:20 ` "Martin v. Löwis"
2005-09-17 22:28 ` Bernd Petrovitsch
2005-09-18 7:23 ` "Martin v. Löwis" [this message]
2005-09-18 14:50 ` Bernd Petrovitsch
2005-09-17 6:45 ` "Martin v. Löwis"
[not found] ` <4NXfZ-5P0-1@gated-at.bofh.it>
[not found] ` <4NYlM-7i0-5@gated-at.bofh.it>
[not found] ` <4Olip-6HH-13@gated-at.bofh.it>
2005-09-19 4:41 ` "Martin v. Löwis"
[not found] <4NVHm-3yE-13@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-15@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-17@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-19@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-21@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-23@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-25@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-27@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-29@gated-at.bofh.it>
[not found] ` <4NVHm-3yE-31@gated-at.bofh.it>
[not found] ` <4NVHn-3yE-33@gated-at.bofh.it>
[not found] ` <4NVHn-3yE-35@gated-at.bofh.it>
[not found] ` <4NVHn-3yE-37@gated-at.bofh.it>
[not found] ` <4NVHn-3yE-39@gated-at.bofh.it>
[not found] ` <4Od1x-3e3-5@gated-at.bofh.it>
[not found] ` <4Od1x-3e3-7@gated-at.bofh.it>
[not found] ` <4Od1w-3e3-3@gated-at.bofh.it>
[not found] ` <4OfZo-7AG-21@gated-at.bofh.it>
2005-09-19 5:11 ` "Martin v. Löwis"
[not found] <4Nvab-7o5-11@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-13@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-15@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-17@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-19@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-21@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-23@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-25@gated-at.bofh.it>
[not found] ` <4Nvab-7o5-27@gated-at.bofh.it>
[not found] ` <4NvjM-7CU-7@gated-at.bofh.it>
[not found] ` <4NvjM-7CU-5@gated-at.bofh.it>
[not found] ` <4NxbR-20S-1@gated-at.bofh.it>
[not found] ` <4NEn7-3M5-7@gated-at.bofh.it>
[not found] ` <4NTvO-yJ-13@gated-at.bofh.it>
2005-09-18 0:53 ` Bodo Eggert
2005-09-18 16:53 ` Bernd Petrovitsch
[not found] ` <4O1MJ-3Hf-5@gated-at.bofh.it>
[not found] ` <4O8Oh-5jp-7@gated-at.bofh.it>
2005-09-18 19:23 ` Bodo Eggert
2005-09-18 21:03 ` Bernd Petrovitsch
2005-09-19 19:37 ` Bodo Eggert
2005-09-18 22:29 ` Valdis.Kletnieks
2005-09-19 6:03 ` H. Peter Anvin
2005-09-19 4:54 ` "Martin v. Löwis"
2005-09-19 8:26 ` Bernd Petrovitsch
2005-09-19 9:00 ` Valdis.Kletnieks
2005-09-19 9:41 ` Bernd Petrovitsch
2005-09-19 21:40 ` "Martin v. Löwis"
[not found] <4N6EL-4Hq-3@gated-at.bofh.it>
[not found] ` <4N6EL-4Hq-5@gated-at.bofh.it>
[not found] ` <4N6EK-4Hq-1@gated-at.bofh.it>
[not found] ` <4N6EX-4Hq-27@gated-at.bofh.it>
[not found] ` <4N6Ox-4Ts-33@gated-at.bofh.it>
[not found] ` <4N7AS-67L-3@gated-at.bofh.it>
2005-09-16 18:02 ` Bodo Eggert
2005-09-16 18:09 ` H. Peter Anvin
2005-09-16 18:57 ` Bodo Eggert
2005-09-16 19:08 ` Martin Mares
2005-09-16 19:25 ` H. Peter Anvin
2005-09-16 19:57 ` Horst von Brand
[not found] ` <200509170028.59973.dhazelton@enter.net>
2005-09-17 6:28 ` "Martin v. Löwis"
2005-09-17 22:31 ` D. Hazelton
2005-09-18 3:45 ` Kyle Moffett
2005-09-19 0:14 ` D. Hazelton
2005-09-18 6:58 ` "Martin v. Löwis"
2005-09-19 0:31 ` D. Hazelton
2005-09-17 17:16 ` Bodo Eggert
[not found] <4B2ZV-2dl-7@gated-at.bofh.it>
[not found] ` <4HKbZ-Cx-37@gated-at.bofh.it>
2005-09-15 18:24 ` "Martin v. Löwis"
2005-09-15 18:25 ` H. Peter Anvin
2005-09-15 18:39 ` "Martin v. Löwis"
2005-09-15 19:20 ` H. Peter Anvin
2005-09-16 8:13 ` Bernd Petrovitsch
2005-08-13 12:07 "Martin v. Löwis"
2005-08-13 16:35 ` Stephen Pollei
2005-08-13 18:42 ` Lee Revell
2005-08-13 18:49 ` Hugo Mills
2005-08-13 18:53 ` Lee Revell
2005-08-14 0:57 ` Alan Cox
2005-08-14 1:19 ` Kyle Moffett
2005-08-14 1:40 ` Lee Revell
2005-08-14 10:40 ` Wichert Akkerman
2005-08-13 19:20 ` Lee Revell
2005-08-16 9:46 ` Jan Engelhardt
2005-08-14 0:53 ` Alan Cox
2005-08-14 4:10 ` James Cloos
2005-08-14 6:18 ` Jason L Tibbitts III
[not found] ` <feed8cdd050814125845fe4e2e@mail.gmail.com>
2005-08-14 19:59 ` Lee Revell
2005-08-14 20:13 ` Stephen Pollei
2005-08-14 20:22 ` Lee Revell
2005-08-14 22:10 ` "Martin v. Löwis"
2005-08-14 23:55 ` Alan Cox
2005-08-16 13:56 ` David Madore
[not found] ` <mailman.1124063520.13257.linux-kernel2news@redhat.com>
2005-08-16 20:17 ` Pete Zaitcev
2005-08-14 21:52 ` Kyle Moffett
2005-08-14 22:12 ` Valdis.Kletnieks
2005-08-15 8:01 ` Helge Hafting
2005-08-31 23:27 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=432D15FA.8030100@v.loewis.de \
--to=martin@v.loewis.de \
--cc=bernd@firmix.at \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox