From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932152AbVHMMHl (ORCPT ); Sat, 13 Aug 2005 08:07:41 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932154AbVHMMHl (ORCPT ); Sat, 13 Aug 2005 08:07:41 -0400 Received: from smtprelay03.ispgateway.de ([80.67.18.15]:63123 "EHLO smtprelay03.ispgateway.de") by vger.kernel.org with ESMTP id S932152AbVHMMHl (ORCPT ); Sat, 13 Aug 2005 08:07:41 -0400 Message-ID: <42FDE286.40707@v.loewis.de> Date: Sat, 13 Aug 2005 14:07:34 +0200 From: =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= User-Agent: Debian Thunderbird 1.0.6 (X11/20050802) X-Accept-Language: en-us, en MIME-Version: 1.0 To: linux-kernel@vger.kernel.org Subject: [Patch] Support UTF-8 scripts X-Enigmail-Version: 0.92.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org This patch adds support for UTF-8 signatures (aka BOM, byte order mark) to binfmt_script. Files that start with EF BF FF # ! are now recognized as scripts (in addition to files starting with # !). With such support, creating scripts that reliably carry non-ASCII characters is simplified. Editors and the script interpreter can easily agree on what the encoding of the script is, and the interpreter can then render strings appropriately. Currently, Python supports source files that start with the UTF-8 signature; the approach would naturally extend to Perl to enhance/replace the "use utf8" pragma. Likewise, Tcl could use the UTF-8 signature to reliably identify UTF-8 source code (instead of assuming [encoding system] for source code). Please find the patch attached below. Regards, Martin Signed-off-by: Martin v. Löwis diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -1,7 +1,7 @@ /* * linux/fs/binfmt_script.c * - * Copyright (C) 1996 Martin von Löwis + * Copyright (C) 1996, 2005 Martin von Löwis * original #!-checking implemented by tytso. */ @@ -23,7 +23,16 @@ static int load_script(struct linux_binp char interp[BINPRM_BUF_SIZE]; int retval; - if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') || (bprm->sh_bang)) + /* It is a recursive invocation. */ + if (bprm->sh_bang) + return -ENOEXEC; + + /* It starts neither with #!, nor with #! preceded by + the UTF-8 signature. */ + if (!(((bprm->buf[0] == '#') && (bprm->buf[1] == '!')) + || ((bprm->buf[0] == '\xef') && (bprm->buf[1] == '\xbb') + && (bprm->buf[2] == '\xbf') && (bprm->buf[3] == '#') + && (bprm->buf[4] == '!')))) return -ENOEXEC; /* * This section does the #! interpretation. @@ -46,7 +55,8 @@ static int load_script(struct linux_binp else break; } - for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++); + cp = (bprm->buf[0]=='\xef') ? bprm->buf+5 : bprm->buf+2; + while ((*cp == ' ') || (*cp == '\t')) cp++; if (*cp == '\0') return -ENOEXEC; /* No interpreter name found */ i_name = cp;