linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* + preadv-pwritev-add-preadv-and-pwritev-system-calls.patch added to -mm tree
@ 2009-02-02 21:35 akpm
       [not found] ` <200902022135.n12LZa1a010673-AB4EexQrvXRQetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
  2009-02-02 21:51 ` H. Peter Anvin
  0 siblings, 2 replies; 10+ messages in thread
From: akpm @ 2009-02-02 21:35 UTC (permalink / raw)
  To: mm-commits
  Cc: kraxel, arnd, hpa, linux-api, linux-arch, mingo, ralf, tglx, viro


The patch titled
     preadv/pwritev: Add preadv and pwritev system calls.
has been added to the -mm tree.  Its filename is
     preadv-pwritev-add-preadv-and-pwritev-system-calls.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find
out what to do about this

The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/

------------------------------------------------------
Subject: preadv/pwritev: Add preadv and pwritev system calls.
From: Gerd Hoffmann <kraxel@redhat.com>

This patch adds preadv and pwritev system calls.  These syscalls are a
pretty straightforward combination of pread and readv (same for write). 
They are quite useful for doing vectored I/O in threaded applications. 
Using lseek+readv instead opens race windows you'll have to plug with
locking.

Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.html

The application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:

  ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
  ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);

This prototype has one problem though: On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs doesn't
allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
we'll need a wrappers in glibc anyway I've decided to push problem to
glibc entriely and use a syscall prototype which works without
arch-specific wrappers inside the kernel: The offset argument is
explicitly splitted into two 32bit values.

The patch sports the actual system call implementation and the windup in
the x86 system call tables.  Other archs follow as separate patches.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/ia32/ia32entry.S          |    2 +
 arch/x86/include/asm/unistd_32.h   |    2 +
 arch/x86/include/asm/unistd_64.h   |    4 ++
 arch/x86/kernel/syscall_table_32.S |    2 +
 fs/compat.c                        |   36 +++++++++++++++++++
 fs/read_write.c                    |   50 +++++++++++++++++++++++++++
 include/linux/compat.h             |    6 +++
 include/linux/syscalls.h           |    4 ++
 8 files changed, 106 insertions(+)

diff -puN arch/x86/ia32/ia32entry.S~preadv-pwritev-add-preadv-and-pwritev-system-calls arch/x86/ia32/ia32entry.S
--- a/arch/x86/ia32/ia32entry.S~preadv-pwritev-add-preadv-and-pwritev-system-calls
+++ a/arch/x86/ia32/ia32entry.S
@@ -826,4 +826,6 @@ ia32_sys_call_table:
 	.quad sys_dup3			/* 330 */
 	.quad sys_pipe2
 	.quad sys_inotify_init1
+	.quad compat_sys_preadv
+	.quad compat_sys_pwritev
 ia32_syscall_end:
diff -puN arch/x86/include/asm/unistd_32.h~preadv-pwritev-add-preadv-and-pwritev-system-calls arch/x86/include/asm/unistd_32.h
--- a/arch/x86/include/asm/unistd_32.h~preadv-pwritev-add-preadv-and-pwritev-system-calls
+++ a/arch/x86/include/asm/unistd_32.h
@@ -338,6 +338,8 @@
 #define __NR_dup3		330
 #define __NR_pipe2		331
 #define __NR_inotify_init1	332
+#define __NR_preadv		333
+#define __NR_pwritev		334
 
 #ifdef __KERNEL__
 
diff -puN arch/x86/include/asm/unistd_64.h~preadv-pwritev-add-preadv-and-pwritev-system-calls arch/x86/include/asm/unistd_64.h
--- a/arch/x86/include/asm/unistd_64.h~preadv-pwritev-add-preadv-and-pwritev-system-calls
+++ a/arch/x86/include/asm/unistd_64.h
@@ -653,6 +653,10 @@ __SYSCALL(__NR_dup3, sys_dup3)
 __SYSCALL(__NR_pipe2, sys_pipe2)
 #define __NR_inotify_init1			294
 __SYSCALL(__NR_inotify_init1, sys_inotify_init1)
+#define __NR_preadv				295
+__SYSCALL(__NR_preadv, sys_preadv)
+#define __NR_pwritev				296
+__SYSCALL(__NR_pwritev, sys_pwritev)
 
 
 #ifndef __NO_STUBS
diff -puN arch/x86/kernel/syscall_table_32.S~preadv-pwritev-add-preadv-and-pwritev-system-calls arch/x86/kernel/syscall_table_32.S
--- a/arch/x86/kernel/syscall_table_32.S~preadv-pwritev-add-preadv-and-pwritev-system-calls
+++ a/arch/x86/kernel/syscall_table_32.S
@@ -332,3 +332,5 @@ ENTRY(sys_call_table)
 	.long sys_dup3			/* 330 */
 	.long sys_pipe2
 	.long sys_inotify_init1
+	.long sys_preadv
+	.long sys_pwritev
diff -puN fs/compat.c~preadv-pwritev-add-preadv-and-pwritev-system-calls fs/compat.c
--- a/fs/compat.c~preadv-pwritev-add-preadv-and-pwritev-system-calls
+++ a/fs/compat.c
@@ -1204,6 +1204,24 @@ compat_sys_readv(unsigned long fd, const
 	return ret;
 }
 
+asmlinkage ssize_t
+compat_sys_preadv(unsigned long fd, const struct compat_iovec __user *vec,
+		  unsigned long vlen, u32 pos_high, u32 pos_low)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+	struct file *file;
+	ssize_t ret;
+
+	if (pos < 0)
+		return -EINVAL;
+	file = fget(fd);
+	if (!file)
+		return -EBADF;
+	ret = compat_readv(file, vec, vlen, &pos);
+	fput(file);
+	return ret;
+}
+
 static size_t compat_writev(struct file *file,
 			    const struct compat_iovec __user *vec,
 			    unsigned long vlen, loff_t *pos)
@@ -1241,6 +1259,24 @@ compat_sys_writev(unsigned long fd, cons
 	return ret;
 }
 
+asmlinkage ssize_t
+compat_sys_pwritev(unsigned long fd, const struct compat_iovec __user *vec,
+		   unsigned long vlen, u32 pos_high, u32 pos_low)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+	struct file *file;
+	ssize_t ret;
+
+	if (pos < 0)
+		return -EINVAL;
+	file = fget(fd);
+	if (!file)
+		return -EBADF;
+	ret = compat_writev(file, vec, vlen, &pos);
+	fput(file);
+	return ret;
+}
+
 asmlinkage long
 compat_sys_vmsplice(int fd, const struct compat_iovec __user *iov32,
 		    unsigned int nr_segs, unsigned int flags)
diff -puN fs/read_write.c~preadv-pwritev-add-preadv-and-pwritev-system-calls fs/read_write.c
--- a/fs/read_write.c~preadv-pwritev-add-preadv-and-pwritev-system-calls
+++ a/fs/read_write.c
@@ -731,6 +731,56 @@ SYSCALL_DEFINE3(writev, unsigned long, f
 	return ret;
 }
 
+SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, u32, pos_high, u32, pos_low)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+	struct file *file;
+	ssize_t ret = -EBADF;
+	int fput_needed;
+
+	if (pos < 0)
+		return -EINVAL;
+
+	file = fget_light(fd, &fput_needed);
+	if (file) {
+		ret = -ESPIPE;
+		if (file->f_mode & FMODE_PREAD)
+			ret = vfs_readv(file, vec, vlen, &pos);
+		fput_light(file, fput_needed);
+	}
+
+	if (ret > 0)
+		add_rchar(current, ret);
+	inc_syscr(current);
+	return ret;
+}
+
+SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, u32, pos_high, u32, pos_low)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+	struct file *file;
+	ssize_t ret = -EBADF;
+	int fput_needed;
+
+	if (pos < 0)
+		return -EINVAL;
+
+	file = fget_light(fd, &fput_needed);
+	if (file) {
+		ret = -ESPIPE;
+		if (file->f_mode & FMODE_PWRITE)
+			ret = vfs_writev(file, vec, vlen, &pos);
+		fput_light(file, fput_needed);
+	}
+
+	if (ret > 0)
+		add_wchar(current, ret);
+	inc_syscw(current);
+	return ret;
+}
+
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 			   size_t count, loff_t max)
 {
diff -puN include/linux/compat.h~preadv-pwritev-add-preadv-and-pwritev-system-calls include/linux/compat.h
--- a/include/linux/compat.h~preadv-pwritev-add-preadv-and-pwritev-system-calls
+++ a/include/linux/compat.h
@@ -183,6 +183,12 @@ asmlinkage ssize_t compat_sys_readv(unsi
 		const struct compat_iovec __user *vec, unsigned long vlen);
 asmlinkage ssize_t compat_sys_writev(unsigned long fd,
 		const struct compat_iovec __user *vec, unsigned long vlen);
+asmlinkage ssize_t compat_sys_preadv(unsigned long fd,
+		const struct compat_iovec __user *vec,
+		unsigned long vlen, u32 pos_high, u32 pos_low);
+asmlinkage ssize_t compat_sys_pwritev(unsigned long fd,
+		const struct compat_iovec __user *vec,
+		unsigned long vlen, u32 pos_high, u32 pos_low);
 
 int compat_do_execve(char * filename, compat_uptr_t __user *argv,
 	        compat_uptr_t __user *envp, struct pt_regs * regs);
diff -puN include/linux/syscalls.h~preadv-pwritev-add-preadv-and-pwritev-system-calls include/linux/syscalls.h
--- a/include/linux/syscalls.h~preadv-pwritev-add-preadv-and-pwritev-system-calls
+++ a/include/linux/syscalls.h
@@ -461,6 +461,10 @@ asmlinkage long sys_pread64(unsigned int
 			    size_t count, loff_t pos);
 asmlinkage long sys_pwrite64(unsigned int fd, const char __user *buf,
 			     size_t count, loff_t pos);
+asmlinkage long sys_preadv(unsigned long fd, const struct iovec __user *vec,
+			   unsigned long vlen, u32 pos_high, u32 pos_low);
+asmlinkage long sys_pwritev(unsigned long fd, const struct iovec __user *vec,
+			    unsigned long vlen, u32 pos_high, u32 pos_low);
 asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
 asmlinkage long sys_mkdir(const char __user *pathname, int mode);
 asmlinkage long sys_chdir(const char __user *filename);
_

Patches currently in -mm which might be from kraxel@redhat.com are

linux-next.patch
preadv-pwritev-create-compat_readv.patch
preadv-pwritev-create-compat_writev.patch
preadv-pwritev-add-preadv-and-pwritev-system-calls.patch
preadv-pwritev-mips-add-preadv2-and-pwritev2-syscalls.patch
preadv-pwritev-switch-compat-readv-preadv-writev-pwritev-from-fget-to-fget_light.patch

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: + preadv-pwritev-add-preadv-and-pwritev-system-calls.patch added to -mm tree
       [not found] ` <200902022135.n12LZa1a010673-AB4EexQrvXRQetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
@ 2009-02-02 21:50   ` H. Peter Anvin
       [not found]     ` <49876A91.4000705-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2009-02-02 21:50 UTC (permalink / raw)
  To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mm-commits-u79uwXL29TY76Z2rM5mHXA, kraxel-H+wXaHxf7aLQT0dZR+AlfA,
	arnd-r2nGTMty4D4, linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA, mingo-X9Un+BFzKDI,
	ralf-6z/3iImG2C8G8FEW9MqTrA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org wrote:
> 
> This prototype has one problem though: On 32bit archs is the (64bit)
> offset argument unaligned, which the syscall ABI of several archs doesn't
> allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
> we'll need a wrappers in glibc anyway I've decided to push problem to
> glibc entriely and use a syscall prototype which works without
> arch-specific wrappers inside the kernel: The offset argument is
> explicitly splitted into two 32bit values.
> 

That rather sucks.  It'd be cleaner to just shuffle the argument order.

	-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: + preadv-pwritev-add-preadv-and-pwritev-system-calls.patch added to -mm tree
  2009-02-02 21:35 + preadv-pwritev-add-preadv-and-pwritev-system-calls.patch added to -mm tree akpm
       [not found] ` <200902022135.n12LZa1a010673-AB4EexQrvXRQetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
@ 2009-02-02 21:51 ` H. Peter Anvin
  1 sibling, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2009-02-02 21:51 UTC (permalink / raw)
  To: akpm
  Cc: mm-commits, kraxel, arnd, linux-api, linux-arch, mingo, ralf,
	tglx, viro

> 
> This prototype has one problem though: On 32bit archs is the (64bit)
> offset argument unaligned, which the syscall ABI of several archs doesn't
> allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
> we'll need a wrappers in glibc anyway I've decided to push problem to
> glibc entriely and use a syscall prototype which works without
> arch-specific wrappers inside the kernel: The offset argument is
> explicitly splitted into two 32bit values.
> 

This is also an excellent example of why having syscall stubs 
autogenerated would be a very good thing, to avoid these kinds of 
abortions polluting the ABIs of *every* architecture...

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: + preadv-pwritev-add-preadv-and-pwritev-system-calls.patch added to -mm tree
       [not found]     ` <49876A91.4000705-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
@ 2009-02-03 10:14       ` Gerd Hoffmann
       [not found]         ` <49881904.30705-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Gerd Hoffmann @ 2009-02-03 10:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA, mingo-X9Un+BFzKDI,
	ralf-6z/3iImG2C8G8FEW9MqTrA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

H. Peter Anvin wrote:
> akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org wrote:
>>
>> This prototype has one problem though: On 32bit archs is the (64bit)
>> offset argument unaligned, which the syscall ABI of several archs doesn't
>> allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
>> we'll need a wrappers in glibc anyway I've decided to push problem to
>> glibc entriely and use a syscall prototype which works without
>> arch-specific wrappers inside the kernel: The offset argument is
>> explicitly splitted into two 32bit values.
> 
> That rather sucks.  It'd be cleaner to just shuffle the argument order.

That was discussed too.  Doesn't solve the problem that you need some
wrap-o-magic in glibc (this time to swap arguments instead of splitting
the offset).  And is has the drawback that it is confusing to have
different argument ordering at application and syscall level (think
strace).  Check the archives for the details.

cheers,
  Gerd

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: + preadv-pwritev-add-preadv-and-pwritev-system-calls.patch added to -mm tree
       [not found]         ` <49881904.30705-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-02-03 17:06           ` H. Peter Anvin
       [not found]             ` <498879B3.8030002-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2009-02-03 17:06 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA, mingo-X9Un+BFzKDI,
	ralf-6z/3iImG2C8G8FEW9MqTrA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

Gerd Hoffmann wrote:
> H. Peter Anvin wrote:
>> akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org wrote:
>>> This prototype has one problem though: On 32bit archs is the (64bit)
>>> offset argument unaligned, which the syscall ABI of several archs doesn't
>>> allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
>>> we'll need a wrappers in glibc anyway I've decided to push problem to
>>> glibc entriely and use a syscall prototype which works without
>>> arch-specific wrappers inside the kernel: The offset argument is
>>> explicitly splitted into two 32bit values.
>> That rather sucks.  It'd be cleaner to just shuffle the argument order.
> 
> That was discussed too.  Doesn't solve the problem that you need some
> wrap-o-magic in glibc (this time to swap arguments instead of splitting
> the offset).  And is has the drawback that it is confusing to have
> different argument ordering at application and syscall level (think
> strace).  Check the archives for the details.
> 

A pointer would help.

I would say that different argument order in strace is a lot better than
having an argument completely mangled, which is what one gets with this
as proposed.

All in all, I think it is WRONG to make sane architectures suffer for
what broken architectures have to do, and we should implement this the
sane way without any shuffling, splitting, or other braindamage.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: + preadv-pwritev-add-preadv-and-pwritev-system-calls.patch added to -mm tree
       [not found]             ` <498879B3.8030002-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
@ 2009-02-03 17:15               ` Russell King
       [not found]                 ` <20090203171559.GA20898-f404yB8NqCZvn6HldHNs0ANdhmdF6hFW@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Russell King @ 2009-02-03 17:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Gerd Hoffmann, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA, mingo-X9Un+BFzKDI,
	ralf-6z/3iImG2C8G8FEW9MqTrA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On Tue, Feb 03, 2009 at 09:06:59AM -0800, H. Peter Anvin wrote:
> Gerd Hoffmann wrote:
> > H. Peter Anvin wrote:
> >> akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org wrote:
> >>> This prototype has one problem though: On 32bit archs is the (64bit)
> >>> offset argument unaligned, which the syscall ABI of several archs doesn't
> >>> allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
> >>> we'll need a wrappers in glibc anyway I've decided to push problem to
> >>> glibc entriely and use a syscall prototype which works without
> >>> arch-specific wrappers inside the kernel: The offset argument is
> >>> explicitly splitted into two 32bit values.
> >> That rather sucks.  It'd be cleaner to just shuffle the argument order.
> > 
> > That was discussed too.  Doesn't solve the problem that you need some
> > wrap-o-magic in glibc (this time to swap arguments instead of splitting
> > the offset).  And is has the drawback that it is confusing to have
> > different argument ordering at application and syscall level (think
> > strace).  Check the archives for the details.
> > 
> 
> A pointer would help.
> 
> I would say that different argument order in strace is a lot better than
> having an argument completely mangled, which is what one gets with this
> as proposed.
> 
> All in all, I think it is WRONG to make sane architectures suffer for
> what broken architectures have to do, and we should implement this the
> sane way without any shuffling, splitting, or other braindamage.

Disagree.  What is better: having _one_ _common_ argument order in the
syscall interface, or having architectures end up doing their own thing?

Think about your strace argument - you're effectively requiring strace
to know that on architecture X, the syscall argument order is X1, but
on architecture Y, that same syscall has argument order Y1, and maybe
architecture Z, it's again a different order Z1.

That _adds_ complexity to strace - rather than having one ordering to
deal with for a syscall, it now has three different random orders.

That's completely insane.  With one _common_ ordering, at least strace
has the possibility of cleanly sorting it out.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: + preadv-pwritev-add-preadv-and-pwritev-system-calls.patch added to -mm tree
       [not found]                 ` <20090203171559.GA20898-f404yB8NqCZvn6HldHNs0ANdhmdF6hFW@public.gmane.org>
@ 2009-02-03 17:29                   ` H. Peter Anvin
       [not found]                     ` <49887EF4.1090300-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2009-02-03 17:29 UTC (permalink / raw)
  To: H. Peter Anvin, Gerd Hoffmann,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4,
	linux-api-u79uwXL29TY76Z2rM5mHXA, linux-arch

Russell King wrote:
>>
>> All in all, I think it is WRONG to make sane architectures suffer for
>> what broken architectures have to do, and we should implement this the
>> sane way without any shuffling, splitting, or other braindamage.
> 
> Disagree.  What is better: having _one_ _common_ argument order in the
> syscall interface, or having architectures end up doing their own thing?
> 
> Think about your strace argument - you're effectively requiring strace
> to know that on architecture X, the syscall argument order is X1, but
> on architecture Y, that same syscall has argument order Y1, and maybe
> architecture Z, it's again a different order Z1.
> 
> That _adds_ complexity to strace - rather than having one ordering to
> deal with for a syscall, it now has three different random orders.
> 
> That's completely insane.  With one _common_ ordering, at least strace
> has the possibility of cleanly sorting it out.
> 

What we *SHOULD* be doing is to *HAVE ABIS RULE AND STICK TO THEM*.
Which is that on architectures that need a padding register, we add the
padding register on those architectures, and if they need stubs, then we
add stubs -- preferrably by automation (c.f. my automation scripts for
doing exactly that.)

What we *SHOULDN'T DO* is hand-craft the ABI of every system call to
avoid the rules we happen to have already set for ourselves.  If the
rules are too inflexible, they should be sanitized, not worked around.

The sheer number of system calls with weird exceptions to the ABI is a
total mess, and for absolutely no good reason.  All it means is an
exception which last forever, because no amount of automation in the
world can fix it once it is in.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: + preadv-pwritev-add-preadv-and-pwritev-system-calls.patch added to -mm tree
       [not found]                     ` <49887EF4.1090300-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
@ 2009-02-03 17:42                       ` Russell King
  2009-02-03 20:19                         ` H. Peter Anvin
  2009-02-05  0:42                         ` H. Peter Anvin
  0 siblings, 2 replies; 10+ messages in thread
From: Russell King @ 2009-02-03 17:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Gerd Hoffmann, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	mm-commits-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-arch-u79uwXL29TY76Z2rM5mHXA, mingo-X9Un+BFzKDI,
	ralf-6z/3iImG2C8G8FEW9MqTrA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On Tue, Feb 03, 2009 at 09:29:24AM -0800, H. Peter Anvin wrote:
> Russell King wrote:
> >>
> >> All in all, I think it is WRONG to make sane architectures suffer for
> >> what broken architectures have to do, and we should implement this the
> >> sane way without any shuffling, splitting, or other braindamage.
> > 
> > Disagree.  What is better: having _one_ _common_ argument order in the
> > syscall interface, or having architectures end up doing their own thing?
> > 
> > Think about your strace argument - you're effectively requiring strace
> > to know that on architecture X, the syscall argument order is X1, but
> > on architecture Y, that same syscall has argument order Y1, and maybe
> > architecture Z, it's again a different order Z1.
> > 
> > That _adds_ complexity to strace - rather than having one ordering to
> > deal with for a syscall, it now has three different random orders.
> > 
> > That's completely insane.  With one _common_ ordering, at least strace
> > has the possibility of cleanly sorting it out.
> > 
> 
> What we *SHOULD* be doing is to *HAVE ABIS RULE AND STICK TO THEM*.
> Which is that on architectures that need a padding register, we add the
> padding register on those architectures, and if they need stubs, then we
> add stubs -- preferrably by automation (c.f. my automation scripts for
> doing exactly that.)

This is an overly simplified view of things.  There exist syscalls where
it's not just a matter of padding, but instead where the chosen argument
order interacts with the ABI padding requirements to produce something
which the syscall interface can't handle.  There are two solutions to
that: reorder the syscall arguments or split them up into 32-bit high/low
quantities.

I'm thinking there of the silly fadvise64_64 syscall, where ARM was forced
into having a different argument ordering from everything else through
zero discussion.

So, what would your scripts do with the fadvise64_64 syscall on an
architecture requiring natural alignment of the 64-bit args in 32-bit
regs?

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: + preadv-pwritev-add-preadv-and-pwritev-system-calls.patch added to -mm tree
  2009-02-03 17:42                       ` Russell King
@ 2009-02-03 20:19                         ` H. Peter Anvin
  2009-02-05  0:42                         ` H. Peter Anvin
  1 sibling, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2009-02-03 20:19 UTC (permalink / raw)
  To: H. Peter Anvin, Gerd Hoffmann, akpm, mm-commits, arnd, linux-api,
	linux-arch

>
> I'm thinking there of the silly fadvise64_64 syscall, where ARM was forced
> into having a different argument ordering from everything else through
> zero discussion.
>
> So, what would your scripts do with the fadvise64_64 syscall on an
> architecture requiring natural alignment of the 64-bit args in 32-bit
> regs?
>

It depends on the exact rules: in most cases, it doesn't have to do
anything, because the calling convention on the user side and on the
kernel side simply cancel out.  This was lesson #1 while implementing
klibc (which does all system calls that aren't totally ad hoc via
autogenerated stubs.)  If, however, it is needed, the scripts I have can
handle arbitrary weirdness, including stuff like needing manual splitting
and even passing 64-bit arguments by reference.

The important bit here is consistent rules, but ad hoc rules that are per
system call (as opposed to a set of *per platform* rules that are adhered
to) is just a disaster -- and one from which we can never recover with any
kind of automation.  Inconsistencies here hurt everything that involves
the system call interface: all libcs, the kernel itself, strace etc.

    -hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: + preadv-pwritev-add-preadv-and-pwritev-system-calls.patch added to -mm tree
  2009-02-03 17:42                       ` Russell King
  2009-02-03 20:19                         ` H. Peter Anvin
@ 2009-02-05  0:42                         ` H. Peter Anvin
  1 sibling, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2009-02-05  0:42 UTC (permalink / raw)
  To: H. Peter Anvin, Gerd Hoffmann, akpm, mm-commits, arnd, linux-api,
	linux-arch

Russell King wrote:
> 
> This is an overly simplified view of things.  There exist syscalls where
> it's not just a matter of padding, but instead where the chosen argument
> order interacts with the ABI padding requirements to produce something
> which the syscall interface can't handle.  There are two solutions to
> that: reorder the syscall arguments or split them up into 32-bit high/low
> quantities.
> 

I just went back and looked specifically at this, and I wonder if you 
could be more concrete.  In particular, this seems to imply a rule 
somewhere which could probably be improved reasonably easily.

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-02-05  0:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-02 21:35 + preadv-pwritev-add-preadv-and-pwritev-system-calls.patch added to -mm tree akpm
     [not found] ` <200902022135.n12LZa1a010673-AB4EexQrvXRQetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
2009-02-02 21:50   ` H. Peter Anvin
     [not found]     ` <49876A91.4000705-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-02-03 10:14       ` Gerd Hoffmann
     [not found]         ` <49881904.30705-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-02-03 17:06           ` H. Peter Anvin
     [not found]             ` <498879B3.8030002-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-02-03 17:15               ` Russell King
     [not found]                 ` <20090203171559.GA20898-f404yB8NqCZvn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2009-02-03 17:29                   ` H. Peter Anvin
     [not found]                     ` <49887EF4.1090300-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
2009-02-03 17:42                       ` Russell King
2009-02-03 20:19                         ` H. Peter Anvin
2009-02-05  0:42                         ` H. Peter Anvin
2009-02-02 21:51 ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).