* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT [not found] ` <mailman.1097403036.11924@unix-os.sc.intel.com> @ 2004-10-11 21:05 ` Arun Sharma 2004-10-12 21:50 ` David Woodhouse 0 siblings, 1 reply; 8+ messages in thread From: Arun Sharma @ 2004-10-11 21:05 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-kernel, linux-ia64 [-- Attachment #1: Type: text/plain, Size: 1936 bytes --] [ cc'ing LKML - since this really belongs there ] Original thread on linux-ia64 about supporting alternate root for user level emulators: http://marc.theaimsgroup.com/?t=109708875700003&r=1&w=2 I've reattached the patch here for new readers. On 10/10/2004 3:10 AM, Christoph Hellwig wrote: > On Fri, Oct 08, 2004 at 03:37:10PM -0700, Arun Sharma wrote: >> On 10/8/2004 1:08 AM, David Mosberger wrote: >> >> >Unfortunately, that thread ran out in a rather unhelpful manner, as >> >far as I can see. Rusty seemed to agree that the performance-hit of >> >doing it all in user-level was unacceptably high, but I didn't see any >> >actual numbers. There was a suggestion to decouple the altroot from >> >the personality which makes some sense, but nobody actually did >> >anything about it? >> > >> >> I'd really like this issue to be resolved one way or the other. I'm not >> sure I've heard a convincing argument on why my original patch(which adds a >> new exec domain unconditionally) should not be applied. >> >> I'm fine with the attached patch to set the altroot via a system call as >> well. > > I'd still like to see some number on how much smaller a userland emulation > is. Best using qumu because that's opensource and we simply don't care > about intel's propritary stuff. Else your patch looks pretty okay - but > I'd need to go through linux-kernel and hooking up to all architectures. I've prototyped a generic userland solution that covers just open and stat system calls (for completeness, all path walk related syscalls need to be covered) using the LD_PRELOAD approach. I saw a 16% degradation in system time on this benchmark: find /usr/src/linux -name '*.[chS]' | xargs grep fsck mainly due to the doubling of the number of calls to open. Also, there was a slight increase in user time as well, due to malloc/free overhead. shar file with the benchmark and system call interposer attached. -Arun [-- Attachment #2: ia32-exec-domain.patch --] [-- Type: text/plain, Size: 2120 bytes --] Register the ia32 exec domain even though CONFIG_IA32_SUPPORT and CONFIG_COMPAT are turned off. This is necessary to support the alternate root feature for programs running with PER_LINUX32. Signed-off-by: Arun Sharma <arun.sharma@intel.com> --- linux26/include/asm-ia64/ia32.h- 2004-10-05 15:30:05.000000000 -0700 +++ linux26/include/asm-ia64/ia32.h 2004-10-05 15:31:11.000000000 -0700 @@ -11,6 +11,8 @@ #ifndef __ASSEMBLY__ +extern int register_ia32_exec_domain(void); + # ifdef CONFIG_IA32_SUPPORT extern void ia32_cpu_init (void); --- linux26/arch/ia64/kernel/process.c- 2004-10-05 15:20:04.000000000 -0700 +++ linux26/arch/ia64/kernel/process.c 2004-10-05 15:30:57.000000000 -0700 @@ -765,3 +765,19 @@ } EXPORT_SYMBOL(machine_power_off); + +struct exec_domain ia32_exec_domain; + +int __init +register_ia32_exec_domain() +{ + ia32_exec_domain.name = "Linux/x86"; + ia32_exec_domain.handler = NULL; + ia32_exec_domain.pers_low = PER_LINUX32; + ia32_exec_domain.pers_high = PER_LINUX32; + ia32_exec_domain.signal_map = default_exec_domain.signal_map; + ia32_exec_domain.signal_invmap = default_exec_domain.signal_invmap; + return register_exec_domain(&ia32_exec_domain); +} + +__initcall(register_ia32_exec_domain); --- linux26/arch/ia64/ia32/ia32_support.c- 2004-10-05 15:19:24.000000000 -0700 +++ linux26/arch/ia64/ia32/ia32_support.c 2004-10-05 16:37:50.000000000 -0700 @@ -29,7 +29,6 @@ extern void die_if_kernel (char *str, struct pt_regs *regs, long err); -struct exec_domain ia32_exec_domain; struct page *ia32_shared_page[NR_CPUS]; unsigned long *ia32_boot_gdt; unsigned long *cpu_gdt_table[NR_CPUS]; @@ -211,14 +210,6 @@ static int __init ia32_init (void) { - ia32_exec_domain.name = "Linux/x86"; - ia32_exec_domain.handler = NULL; - ia32_exec_domain.pers_low = PER_LINUX32; - ia32_exec_domain.pers_high = PER_LINUX32; - ia32_exec_domain.signal_map = default_exec_domain.signal_map; - ia32_exec_domain.signal_invmap = default_exec_domain.signal_invmap; - register_exec_domain(&ia32_exec_domain); - #if PAGE_SHIFT > IA32_PAGE_SHIFT { extern kmem_cache_t *partial_page_cachep; [-- Attachment #3: perf.shar.txt --] [-- Type: text/plain, Size: 8661 bytes --] #!/bin/sh # This is a shell archive (produced by GNU sharutils 4.2.1). # To extract the files from this archive, save it to some FILE, remove # everything before the `!/bin/sh' line above, then type `sh FILE'. # # Made on 2004-10-11 13:43 PDT by <adsharma@linux-t08.sc.intel.com>. # Source directory was `/home/adsharma'. # # Existing files will *not* be overwritten unless `-c' is specified. # # This shar contains: # length mode name # ------ ---------- ------------------------------------------ # 157 -rw-r--r-- perf/Makefile # 1227 -rw-r--r-- perf/namei.c # 56 -rwxr-xr-x perf/test1.sh # 88 -rwxr-xr-x perf/test.sh # 85 -rw-r--r-- perf/results.txt # save_IFS="${IFS}" IFS="${IFS}:" gettext_dir=FAILED locale_dir=FAILED first_param="$1" for dir in $PATH do if test "$gettext_dir" = FAILED && test -f $dir/gettext \ && ($dir/gettext --version >/dev/null 2>&1) then set `$dir/gettext --version 2>&1` if test "$3" = GNU then gettext_dir=$dir fi fi if test "$locale_dir" = FAILED && test -f $dir/shar \ && ($dir/shar --print-text-domain-dir >/dev/null 2>&1) then locale_dir=`$dir/shar --print-text-domain-dir` fi done IFS="$save_IFS" if test "$locale_dir" = FAILED || test "$gettext_dir" = FAILED then echo=echo else TEXTDOMAINDIR=$locale_dir export TEXTDOMAINDIR TEXTDOMAIN=sharutils export TEXTDOMAIN echo="$gettext_dir/gettext -s" fi if touch -am -t 200112312359.59 $$.touch >/dev/null 2>&1 && test ! -f 200112312359.59 -a -f $$.touch; then shar_touch='touch -am -t $1$2$3$4$5$6.$7 "$8"' elif touch -am 123123592001.59 $$.touch >/dev/null 2>&1 && test ! -f 123123592001.59 -a ! -f 123123592001.5 -a -f $$.touch; then shar_touch='touch -am $3$4$5$6$1$2.$7 "$8"' elif touch -am 1231235901 $$.touch >/dev/null 2>&1 && test ! -f 1231235901 -a -f $$.touch; then shar_touch='touch -am $3$4$5$6$2 "$8"' else shar_touch=: echo $echo 'WARNING: not restoring timestamps. Consider getting and' $echo "installing GNU \`touch', distributed in GNU File Utilities..." echo fi rm -f 200112312359.59 123123592001.59 123123592001.5 1231235901 $$.touch # if mkdir _sh21367; then $echo 'x -' 'creating lock directory' else $echo 'failed to create lock directory' exit 1 fi # ============= perf/Makefile ============== if test ! -d 'perf'; then $echo 'x -' 'creating directory' 'perf' mkdir 'perf' fi if test -f 'perf/Makefile' && test "$first_param" != -c; then $echo 'x -' SKIPPING 'perf/Makefile' '(file already exists)' else $echo 'x -' extracting 'perf/Makefile' '(text)' sed 's/^X//' << 'SHAR_EOF' > 'perf/Makefile' && CFLAGS = -D_GNU_SOURCE X all: libaltroot.so X libaltroot.so: namei.o X gcc -shared -g -fpic namei.o -o libaltroot.so -ldl X clean: X $(RM) namei.o libaltroot.so SHAR_EOF (set 20 04 10 11 13 43 04 'perf/Makefile'; eval "$shar_touch") && chmod 0644 'perf/Makefile' || $echo 'restore of' 'perf/Makefile' 'failed' if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \ && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then md5sum -c << SHAR_EOF >/dev/null 2>&1 \ || $echo 'perf/Makefile:' 'MD5 check failed' c04da369bdfbb9e9a73346b393ac99a0 perf/Makefile SHAR_EOF else shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'perf/Makefile'`" test 157 -eq "$shar_count" || $echo 'perf/Makefile:' 'original size' '157,' 'current size' "$shar_count!" fi fi # ============= perf/namei.c ============== if test -f 'perf/namei.c' && test "$first_param" != -c; then $echo 'x -' SKIPPING 'perf/namei.c' '(file already exists)' else $echo 'x -' extracting 'perf/namei.c' '(text)' sed 's/^X//' << 'SHAR_EOF' > 'perf/namei.c' && #include <stdio.h> #include <stdlib.h> #include <signal.h> #include <dlfcn.h> #include <sys/types.h> #include <sys/stat.h> #include <linux/limits.h> #include <fcntl.h> X #define EMULATE 1 X int open(const char *pathname, int flags, ...) { X static int (*func) (const char *, int, mode_t); X int ret; X mode_t mode; X char *buf; X X if (!func) X func = (int (*)()) dlsym(RTLD_NEXT, "open"); X #ifdef EMULATE X if (*pathname == '/') { X buf = malloc(PATH_MAX); X strncpy(buf, "/emul/ia32-linux", PATH_MAX); X strncat(buf, pathname, PATH_MAX-16); X ret = func(buf, flags, mode); X if (ret != -1) { X free(buf); X return ret; X } X free(buf); X } #endif X X ret = func(pathname, flags, mode); X X return ret; } X int stat(const char *pathname, struct stat *buf1) { X static int (*func) (const char *, struct stat *); X int ret; X char *buf; X X if (!func) X func = (int (*)()) dlsym(RTLD_NEXT, "stat"); X #ifdef EMULATE X if (*pathname == '/') { X buf = malloc(PATH_MAX); X strncpy(buf, "/emul/ia32-linux", PATH_MAX); X strncat(buf, pathname, PATH_MAX-16); X ret = func(buf, buf1); X if (ret != -1) { X free(buf); X return ret; X } X free(buf); X } #endif X X ret = func(pathname, buf1); X X return ret; } SHAR_EOF (set 20 04 10 11 13 28 34 'perf/namei.c'; eval "$shar_touch") && chmod 0644 'perf/namei.c' || $echo 'restore of' 'perf/namei.c' 'failed' if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \ && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then md5sum -c << SHAR_EOF >/dev/null 2>&1 \ || $echo 'perf/namei.c:' 'MD5 check failed' 968c756b5664cc2125b842b1750e40a8 perf/namei.c SHAR_EOF else shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'perf/namei.c'`" test 1227 -eq "$shar_count" || $echo 'perf/namei.c:' 'original size' '1227,' 'current size' "$shar_count!" fi fi # ============= perf/test1.sh ============== if test -f 'perf/test1.sh' && test "$first_param" != -c; then $echo 'x -' SKIPPING 'perf/test1.sh' '(file already exists)' else $echo 'x -' extracting 'perf/test1.sh' '(text)' sed 's/^X//' << 'SHAR_EOF' > 'perf/test1.sh' && #!/bin/sh X find `pwd` -name '*.[chS]' | xargs grep fsck SHAR_EOF (set 20 04 10 11 13 41 25 'perf/test1.sh'; eval "$shar_touch") && chmod 0755 'perf/test1.sh' || $echo 'restore of' 'perf/test1.sh' 'failed' if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \ && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then md5sum -c << SHAR_EOF >/dev/null 2>&1 \ || $echo 'perf/test1.sh:' 'MD5 check failed' bd830f72d5ade56e00a8c5c4f0e4c5fe perf/test1.sh SHAR_EOF else shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'perf/test1.sh'`" test 56 -eq "$shar_count" || $echo 'perf/test1.sh:' 'original size' '56,' 'current size' "$shar_count!" fi fi # ============= perf/test.sh ============== if test -f 'perf/test.sh' && test "$first_param" != -c; then $echo 'x -' SKIPPING 'perf/test.sh' '(file already exists)' else $echo 'x -' extracting 'perf/test.sh' '(text)' sed 's/^X//' << 'SHAR_EOF' > 'perf/test.sh' && #!/bin/sh X find `pwd` -name '*.[chS]' | LD_PRELOAD=~/perf/libaltroot.so xargs grep fsck SHAR_EOF (set 20 04 10 11 13 41 25 'perf/test.sh'; eval "$shar_touch") && chmod 0755 'perf/test.sh' || $echo 'restore of' 'perf/test.sh' 'failed' if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \ && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then md5sum -c << SHAR_EOF >/dev/null 2>&1 \ || $echo 'perf/test.sh:' 'MD5 check failed' 005c01ad066d70c7be266f1bec9074f6 perf/test.sh SHAR_EOF else shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'perf/test.sh'`" test 88 -eq "$shar_count" || $echo 'perf/test.sh:' 'original size' '88,' 'current size' "$shar_count!" fi fi # ============= perf/results.txt ============== if test -f 'perf/results.txt' && test "$first_param" != -c; then $echo 'x -' SKIPPING 'perf/results.txt' '(file already exists)' else $echo 'x -' extracting 'perf/results.txt' '(text)' sed 's/^X//' << 'SHAR_EOF' > 'perf/results.txt' && test1.sh: X real 14.49 user 14.09 sys 0.37 X test.sh: X real 14.68 user 14.13 sys 0.43 X SHAR_EOF (set 20 04 10 11 13 42 04 'perf/results.txt'; eval "$shar_touch") && chmod 0644 'perf/results.txt' || $echo 'restore of' 'perf/results.txt' 'failed' if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \ && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then md5sum -c << SHAR_EOF >/dev/null 2>&1 \ || $echo 'perf/results.txt:' 'MD5 check failed' db123bdf3d15d40c363be0924ccb2b6d perf/results.txt SHAR_EOF else shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'perf/results.txt'`" test 85 -eq "$shar_count" || $echo 'perf/results.txt:' 'original size' '85,' 'current size' "$shar_count!" fi fi rm -fr _sh21367 exit 0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT 2004-10-11 21:05 ` [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT Arun Sharma @ 2004-10-12 21:50 ` David Woodhouse 2004-10-12 22:46 ` Arun Sharma 0 siblings, 1 reply; 8+ messages in thread From: David Woodhouse @ 2004-10-12 21:50 UTC (permalink / raw) To: Arun Sharma; +Cc: Christoph Hellwig, linux-kernel, linux-ia64 On Mon, 2004-10-11 at 14:05 -0700, Arun Sharma wrote: > I've prototyped a generic userland solution that covers just open and > stat system calls (for completeness, all path walk related syscalls > need to be covered) using the LD_PRELOAD approach. > > I saw a 16% degradation in system time on this benchmark: > > find /usr/src/linux -name '*.[chS]' | xargs grep fsck > > mainly due to the doubling of the number of calls to open. Also, there > was a slight increase in user time as well, due to malloc/free > overhead. The patch is entirely bogus. This isn't at all ia64-specific, and doesn't live in arch/ia64. It's just as applicable on _all_ systems where we may want to do CPU or OS emulation. If you make it generic so that qemu can use it for emulating i386 even on machines like ppc64, perhaps it would be saner. -- dwmw2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT 2004-10-12 21:50 ` David Woodhouse @ 2004-10-12 22:46 ` Arun Sharma 2004-10-13 22:27 ` Arun Sharma 2004-10-14 8:50 ` Jakub Jelinek 0 siblings, 2 replies; 8+ messages in thread From: Arun Sharma @ 2004-10-12 22:46 UTC (permalink / raw) To: David Woodhouse; +Cc: Christoph Hellwig, linux-kernel, linux-ia64 [-- Attachment #1: Type: text/plain, Size: 970 bytes --] David Woodhouse wrote: > If you make it generic so that qemu can use it for emulating i386 even > on machines like ppc64, perhaps it would be saner. > There is already an architecture independent solution to the problem - set_fs_altroot(). The only problem is that it's not easily accessible from userland. Accessing it via set_personality() works only if there are multiple exec domains. I can't see how anyone would accept a patch that adds linux-i386 exec domain in generic code. Christoph doesn't like the idea of adding exec-domains just for this purpose and has suggested adding a new system call to set the altroot. A prototype patch to do this already exists. I will be cleaning it up and posting it to LKML later this week. The main purpose of moving the discussion to LKML was to see how receptive people were to the proposed new system call. So far I haven't heard anyone complaining :) -Arun PS: Prototype patch with known bugs attached. [-- Attachment #2: sys-altroot.patch --] [-- Type: text/plain, Size: 4045 bytes --] Index: linux-2.6-cvs/arch/ia64/kernel/entry.S =================================================================== RCS file: /home/adsharma/disk2/cvs/linux-2.5/arch/ia64/kernel/entry.S,v retrieving revision 1.49 diff -u -r1.49 entry.S --- linux-2.6-cvs/arch/ia64/kernel/entry.S 24 Aug 2004 18:27:07 -0000 1.49 +++ linux-2.6-cvs/arch/ia64/kernel/entry.S 8 Oct 2004 17:35:53 -0000 @@ -1527,7 +1527,7 @@ data8 sys_mq_getsetattr data8 sys_ni_syscall // reserved for kexec_load data8 sys_ni_syscall - data8 sys_ni_syscall // 1270 + data8 sys_altroot // 1270 data8 sys_ni_syscall data8 sys_ni_syscall data8 sys_ni_syscall Index: linux-2.6-cvs/fs/namei.c =================================================================== RCS file: /home/adsharma/disk2/cvs/linux-2.5/fs/namei.c,v retrieving revision 1.110 diff -u -r1.110 namei.c --- linux-2.6-cvs/fs/namei.c 2 Oct 2004 17:59:55 -0000 1.110 +++ linux-2.6-cvs/fs/namei.c 8 Oct 2004 18:14:11 -0000 @@ -897,7 +897,7 @@ return 1; } -void set_fs_altroot(void) +int set_fs_altroot(const char __user *altroot) { char *emul = __emul_prefix(); struct nameidata nd; @@ -905,12 +905,20 @@ struct dentry *dentry = NULL, *olddentry; int err; + if (altroot) { + emul = getname(altroot); + if (IS_ERR(emul)) + return PTR_ERR(emul); + } + if (!emul) goto set_it; err = path_lookup(emul, LOOKUP_FOLLOW|LOOKUP_DIRECTORY|LOOKUP_NOALT, &nd); if (!err) { mnt = nd.mnt; dentry = nd.dentry; + } else { + return err; } set_it: write_lock(¤t->fs->lock); @@ -923,6 +931,33 @@ dput(olddentry); mntput(oldmnt); } + return 0; +} + +asmlinkage int sys_altroot(const char __user * altroot) +{ + if (atomic_read(¤t->fs->count) != 1) { + struct fs_struct *fsp, *ofsp; + + fsp = copy_fs_struct(current->fs); + if (fsp == NULL) { + return -ENOMEM; + } + + task_lock(current); + ofsp = current->fs; + current->fs = fsp; + task_unlock(current); + + put_fs_struct(ofsp); + } + + /* + * At that point we are guaranteed to be the sole owner of + * current->fs. + */ + + return set_fs_altroot(altroot); } int fastcall path_lookup(const char *name, unsigned int flags, struct nameidata *nd) Index: linux-2.6-cvs/fs/open.c =================================================================== RCS file: /home/adsharma/disk2/cvs/linux-2.5/fs/open.c,v retrieving revision 1.71 diff -u -r1.71 open.c --- linux-2.6-cvs/fs/open.c 8 Aug 2004 01:54:18 -0000 1.71 +++ linux-2.6-cvs/fs/open.c 8 Oct 2004 17:38:03 -0000 @@ -584,7 +584,7 @@ goto dput_and_out; set_fs_root(current->fs, nd.mnt, nd.dentry); - set_fs_altroot(); + set_fs_altroot(NULL); error = 0; dput_and_out: path_release(&nd); Index: linux-2.6-cvs/include/linux/fs_struct.h =================================================================== RCS file: /home/adsharma/disk2/cvs/linux-2.5/include/linux/fs_struct.h,v retrieving revision 1.7 diff -u -r1.7 fs_struct.h --- linux-2.6-cvs/include/linux/fs_struct.h 17 Nov 2002 03:52:23 -0000 1.7 +++ linux-2.6-cvs/include/linux/fs_struct.h 8 Oct 2004 17:39:16 -0000 @@ -19,7 +19,7 @@ } extern void exit_fs(struct task_struct *); -extern void set_fs_altroot(void); +extern int set_fs_altroot(const char __user *); extern void set_fs_root(struct fs_struct *, struct vfsmount *, struct dentry *); extern void set_fs_pwd(struct fs_struct *, struct vfsmount *, struct dentry *); extern struct fs_struct *copy_fs_struct(struct fs_struct *); Index: linux-2.6-cvs/kernel/exec_domain.c =================================================================== RCS file: /home/adsharma/disk2/cvs/linux-2.5/kernel/exec_domain.c,v retrieving revision 1.18 diff -u -r1.18 exec_domain.c --- linux-2.6-cvs/kernel/exec_domain.c 8 Sep 2004 14:50:53 -0000 1.18 +++ linux-2.6-cvs/kernel/exec_domain.c 8 Oct 2004 17:33:00 -0000 @@ -167,7 +167,7 @@ current->personality = personality; oep = current_thread_info()->exec_domain; current_thread_info()->exec_domain = ep; - set_fs_altroot(); + set_fs_altroot(NULL); module_put(oep->module); return 0; ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT 2004-10-12 22:46 ` Arun Sharma @ 2004-10-13 22:27 ` Arun Sharma 2004-10-14 7:32 ` David Mosberger 2004-10-14 8:50 ` Jakub Jelinek 1 sibling, 1 reply; 8+ messages in thread From: Arun Sharma @ 2004-10-13 22:27 UTC (permalink / raw) To: David Woodhouse; +Cc: Christoph Hellwig, linux-kernel, linux-ia64 [-- Attachment #1: Type: text/plain, Size: 527 bytes --] Arun Sharma wrote: > Christoph doesn't like the idea of adding exec-domains just for this > purpose and has suggested adding a new system call to set the altroot. A > prototype patch to do this already exists. I will be cleaning it up and > posting it to LKML later this week. The main purpose of moving the > discussion to LKML was to see how receptive people were to the proposed > new system call. > Attached is the promised patch. It addresses Christoph's comments and fixes the bug Tony found as well. -Arun [-- Attachment #2: sys_altroot.txt --] [-- Type: text/plain, Size: 3034 bytes --] Add a new system call sys_altroot. This allows using the altroot feature on systems where there is only one exec domain. Signed-off-by: Zou Nanhai <nanhai.zou@intel.com> Signed-off-by: Gordon Jin <gordon.jin@intel.com> Signed-off-by: Arun Sharma <arun.sharma@intel.com> diff -Nraup a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S --- a/arch/ia64/kernel/entry.S 2004-10-12 09:56:51.408496174 -0700 +++ b/arch/ia64/kernel/entry.S 2004-10-12 09:58:17.362596684 -0700 @@ -1527,7 +1527,7 @@ sys_call_table: data8 sys_mq_getsetattr data8 sys_ni_syscall // reserved for kexec_load data8 sys_ni_syscall - data8 sys_ni_syscall // 1270 + data8 sys_setaltroot // 1270 data8 sys_ni_syscall data8 sys_ni_syscall data8 sys_ni_syscall diff -Nraup a/fs/namei.c b/fs/namei.c --- a/fs/namei.c 2004-10-12 09:56:56.895800795 -0700 +++ b/fs/namei.c 2004-10-12 09:58:33.524705861 -0700 @@ -897,20 +897,20 @@ static int __emul_lookup_dentry(const ch return 1; } -void set_fs_altroot(void) +int __set_fs_altroot(const char *altroot) { - char *emul = __emul_prefix(); struct nameidata nd; struct vfsmount *mnt = NULL, *oldmnt; struct dentry *dentry = NULL, *olddentry; int err; - - if (!emul) + if (!altroot) goto set_it; - err = path_lookup(emul, LOOKUP_FOLLOW|LOOKUP_DIRECTORY|LOOKUP_NOALT, &nd); + err = path_lookup(altroot, LOOKUP_FOLLOW|LOOKUP_DIRECTORY|LOOKUP_NOALT, &nd); if (!err) { mnt = nd.mnt; dentry = nd.dentry; + } else { + return err; } set_it: write_lock(¤t->fs->lock); @@ -923,6 +923,58 @@ set_it: dput(olddentry); mntput(oldmnt); } + return 0; +} + +void set_fs_altroot() +{ + char *emul = __emul_prefix(); + + __set_fs_altroot(emul); +} + +asmlinkage long sys_setaltroot(const char __user * altroot) +{ + char *emul = NULL; + int ret; + + if (altroot) { + emul = getname(altroot); + if (IS_ERR(emul)) { + ret = PTR_ERR(emul); + goto out; + } + } + + if (atomic_read(¤t->fs->count) != 1) { + struct fs_struct *fsp, *ofsp; + + fsp = copy_fs_struct(current->fs); + if (fsp == NULL) { + ret = -ENOMEM; + goto out_putname; + } + + task_lock(current); + ofsp = current->fs; + current->fs = fsp; + task_unlock(current); + + put_fs_struct(ofsp); + } + + /* + * At that point we are guaranteed to be the sole owner of + * current->fs. + */ + + ret = __set_fs_altroot(emul); + +out_putname: + if (emul) + putname(emul); +out: + return ret; } int fastcall path_lookup(const char *name, unsigned int flags, struct nameidata *nd) diff -Nraup a/include/linux/syscalls.h b/include/linux/syscalls.h --- a/include/linux/syscalls.h 2004-10-12 09:56:58.124316405 -0700 +++ b/include/linux/syscalls.h 2004-10-12 09:58:17.362596684 -0700 @@ -489,6 +489,7 @@ asmlinkage long sys_nfsservctl(int cmd, void __user *res); asmlinkage long sys_syslog(int type, char __user *buf, int len); asmlinkage long sys_uselib(const char __user *library); +asmlinkage long sys_setaltroot(const char __user *altroot); asmlinkage long sys_ni_syscall(void); #endif ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT 2004-10-13 22:27 ` Arun Sharma @ 2004-10-14 7:32 ` David Mosberger 2004-10-14 8:25 ` David Woodhouse 0 siblings, 1 reply; 8+ messages in thread From: David Mosberger @ 2004-10-14 7:32 UTC (permalink / raw) To: Arun Sharma; +Cc: David Woodhouse, Christoph Hellwig, linux-kernel, linux-ia64 >>>>> On Wed, 13 Oct 2004 15:27:05 -0700, Arun Sharma <arun.sharma@intel.com> said: Arun> Arun Sharma wrote: >> Christoph doesn't like the idea of adding exec-domains just for >> this purpose and has suggested adding a new system call to set >> the altroot. A prototype patch to do this already exists. I will >> be cleaning it up and posting it to LKML later this week. The >> main purpose of moving the discussion to LKML was to see how >> receptive people were to the proposed new system call. Arun> Attached is the promised patch. It addresses Christoph's Arun> comments and fixes the bug Tony found as well. I like it. Once it's in (surely it will go in, no? ;-), we can change asm-ia64/namei.h to define a non-NULL __emul_prefix() only if CONFIG_IA32_SUPPORT is defined. One less hardcoded path to worry about! --david ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT 2004-10-14 7:32 ` David Mosberger @ 2004-10-14 8:25 ` David Woodhouse 0 siblings, 0 replies; 8+ messages in thread From: David Woodhouse @ 2004-10-14 8:25 UTC (permalink / raw) To: davidm; +Cc: Arun Sharma, Christoph Hellwig, linux-kernel, linux-ia64 On Thu, 2004-10-14 at 00:32 -0700, David Mosberger wrote: > Arun> Attached is the promised patch. It addresses Christoph's > Arun> comments and fixes the bug Tony found as well. > > I like it. Once it's in (surely it will go in, no? ;-), It certainly looks saner, yes. If this is the way we're going, then the new syscall needs to be added on all other architectures, and I'd suggest that we should see a real open-source user of it too before it's merged. Does anyone have the qemu patches? -- dwmw2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT 2004-10-12 22:46 ` Arun Sharma 2004-10-13 22:27 ` Arun Sharma @ 2004-10-14 8:50 ` Jakub Jelinek 2004-10-14 17:53 ` Arun Sharma 1 sibling, 1 reply; 8+ messages in thread From: Jakub Jelinek @ 2004-10-14 8:50 UTC (permalink / raw) To: Arun Sharma; +Cc: David Woodhouse, Christoph Hellwig, linux-kernel, linux-ia64 On Tue, Oct 12, 2004 at 03:46:39PM -0700, Arun Sharma wrote: > --- linux-2.6-cvs/fs/namei.c 2 Oct 2004 17:59:55 -0000 1.110 > +++ linux-2.6-cvs/fs/namei.c 8 Oct 2004 18:14:11 -0000 > @@ -897,7 +897,7 @@ > return 1; > } > > -void set_fs_altroot(void) > +int set_fs_altroot(const char __user *altroot) > { > char *emul = __emul_prefix(); I think you should change this to char *emul; > struct nameidata nd; > @@ -905,12 +905,20 @@ > struct dentry *dentry = NULL, *olddentry; > int err; > > + if (altroot) { > + emul = getname(altroot); > + if (IS_ERR(emul)) > + return PTR_ERR(emul); > + } > + and add here else emul = __emul_prefix(); There is no point in calling __emul_prefix () when it will be thrown away. Jakub ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT 2004-10-14 8:50 ` Jakub Jelinek @ 2004-10-14 17:53 ` Arun Sharma 0 siblings, 0 replies; 8+ messages in thread From: Arun Sharma @ 2004-10-14 17:53 UTC (permalink / raw) To: Jakub Jelinek Cc: David Woodhouse, Christoph Hellwig, linux-kernel, linux-ia64 On 10/14/2004 1:50 AM, Jakub Jelinek wrote: > There is no point in calling __emul_prefix () when it will be thrown away. This is addressed in the most recent patch posted to the mailing list. -Arun ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-10-14 18:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <41643EC0.1010505@intel.com>
[not found] ` <20041007142710.A12688@infradead.org>
[not found] ` <4165D4C9.2040804@intel.com>
[not found] ` <mailman.1097223239.25078@unix-os.sc.intel.com>
[not found] ` <41671696.1060706@intel.com>
[not found] ` <mailman.1097403036.11924@unix-os.sc.intel.com>
2004-10-11 21:05 ` [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT Arun Sharma
2004-10-12 21:50 ` David Woodhouse
2004-10-12 22:46 ` Arun Sharma
2004-10-13 22:27 ` Arun Sharma
2004-10-14 7:32 ` David Mosberger
2004-10-14 8:25 ` David Woodhouse
2004-10-14 8:50 ` Jakub Jelinek
2004-10-14 17:53 ` Arun Sharma
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox