public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] System call design: preadv & pwritev.
@ 2008-12-18 11:42 Gerd Hoffmann
  2008-12-18 11:42 ` [PATCH v4 1/3] Add missing accounting calls to compat_sys_{readv,writev} Gerd Hoffmann
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Gerd Hoffmann @ 2008-12-18 11:42 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-api; +Cc: aarcange, Gerd Hoffmann

  Hi folks,

Guess I have your attention now thanks to LWN ;)

Next round of the preadv & pwritev patch series, hopefully finally
solving the architecture issues.  I've decided to go with the suggestion
from the s390 guys:  split the 64bit offset into two explicitly coded
32bit halves.  The syscall prototype should work now for all archs
without (in-kernel) wrappers.  glibc must wrap the syscalls though to
hide that split from the applications.  But there is no way around glibc
wrappers anyway, so I think that is sensible decision.

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 1/3] Add missing accounting calls to compat_sys_{readv,writev}.
  2008-12-18 11:42 [PATCH v4 0/3] System call design: preadv & pwritev Gerd Hoffmann
@ 2008-12-18 11:42 ` Gerd Hoffmann
  2008-12-18 11:42 ` [PATCH v4 2/3] Add preadv and pwritev system calls Gerd Hoffmann
  2008-12-18 11:42 ` [PATCH v4 3/3] MIPS: Add preadv(2) and pwritev(2) syscalls Gerd Hoffmann
  2 siblings, 0 replies; 6+ messages in thread
From: Gerd Hoffmann @ 2008-12-18 11:42 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-api; +Cc: aarcange, Gerd Hoffmann

[ Note: unrelated bugfix spotted during preadv review,
  included for completeness, akpm picked it into -mm already. ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 fs/compat.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index e5f49f5..aab2234 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1187,6 +1187,9 @@ compat_sys_readv(unsigned long fd, const struct compat_iovec __user *vec, unsign
 	ret = compat_do_readv_writev(READ, file, vec, vlen, &file->f_pos);
 
 out:
+	if (ret > 0)
+		add_rchar(current, ret);
+	inc_syscr(current);
 	fput(file);
 	return ret;
 }
@@ -1210,6 +1213,9 @@ compat_sys_writev(unsigned long fd, const struct compat_iovec __user *vec, unsig
 	ret = compat_do_readv_writev(WRITE, file, vec, vlen, &file->f_pos);
 
 out:
+	if (ret > 0)
+		add_wchar(current, ret);
+	inc_syscw(current);
 	fput(file);
 	return ret;
 }
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 2/3] Add preadv and pwritev system calls.
  2008-12-18 11:42 [PATCH v4 0/3] System call design: preadv & pwritev Gerd Hoffmann
  2008-12-18 11:42 ` [PATCH v4 1/3] Add missing accounting calls to compat_sys_{readv,writev} Gerd Hoffmann
@ 2008-12-18 11:42 ` Gerd Hoffmann
  2008-12-18 12:34   ` Arnd Bergmann
  2008-12-18 11:42 ` [PATCH v4 3/3] MIPS: Add preadv(2) and pwritev(2) syscalls Gerd Hoffmann
  2 siblings, 1 reply; 6+ messages in thread
From: Gerd Hoffmann @ 2008-12-18 11:42 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-api; +Cc: aarcange, Gerd Hoffmann

This patch adds preadv and pwritev system calls.  These syscalls are a
pretty straightforward combination of pread and readv (same for write).
They are quite useful for doing vectored I/O in threaded applications.
Using lseek+readv instead opens race windows you'll have to plug with
locking.

Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.html

The application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:

  ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
  ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);

This prototype has one problem though:  On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs
doesn't allow to do.  At least s390 needs a wrapper in glibc to handle
this.  As we'll need a wrappers in glibc anyway I've decided to push
problem to glibc entriely and use a syscall prototype which works
without arch-specific wrappers inside the kernel:  The offset argument
is explicitly splitted into two 32bit values.

The patch sports the actual system call implementation and the windup in
the x86 system call tables.  Other archs follow as separate patches.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 arch/x86/ia32/ia32entry.S          |    2 +
 arch/x86/include/asm/unistd_32.h   |    2 +
 arch/x86/include/asm/unistd_64.h   |    4 ++
 arch/x86/kernel/syscall_table_32.S |    2 +
 fs/compat.c                        |   63 ++++++++++++++++++++++++++++++++++++
 fs/read_write.c                    |   50 ++++++++++++++++++++++++++++
 6 files changed, 123 insertions(+), 0 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 256b00b..9a8501b 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -826,4 +826,6 @@ ia32_sys_call_table:
 	.quad sys_dup3			/* 330 */
 	.quad sys_pipe2
 	.quad sys_inotify_init1
+	.quad compat_sys_preadv
+	.quad compat_sys_pwritev
 ia32_syscall_end:
diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index f2bba78..6e72d74 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -338,6 +338,8 @@
 #define __NR_dup3		330
 #define __NR_pipe2		331
 #define __NR_inotify_init1	332
+#define __NR_preadv		333
+#define __NR_pwritev		334
 
 #ifdef __KERNEL__
 
diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index d2e415e..f818294 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -653,6 +653,10 @@ __SYSCALL(__NR_dup3, sys_dup3)
 __SYSCALL(__NR_pipe2, sys_pipe2)
 #define __NR_inotify_init1			294
 __SYSCALL(__NR_inotify_init1, sys_inotify_init1)
+#define __NR_preadv				295
+__SYSCALL(__NR_preadv, sys_preadv)
+#define __NR_pwritev				296
+__SYSCALL(__NR_pwritev, sys_pwritev)
 
 
 #ifndef __NO_STUBS
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index d44395f..a1a5506 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -332,3 +332,5 @@ ENTRY(sys_call_table)
 	.long sys_dup3			/* 330 */
 	.long sys_pipe2
 	.long sys_inotify_init1
+	.long sys_preadv
+	.long sys_pwritev
diff --git a/fs/compat.c b/fs/compat.c
index aab2234..00e18aa 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1220,6 +1220,69 @@ out:
 	return ret;
 }
 
+asmlinkage ssize_t
+compat_sys_preadv(unsigned long fd, const struct compat_iovec __user *vec,
+                  unsigned long vlen, u32 pos_high, u32 pos_low)
+{
+        loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+	struct file *file;
+	ssize_t ret = -EBADF;
+
+	if (pos < 0)
+		return -EINVAL;
+
+	file = fget(fd);
+	if (!file)
+		return -EBADF;
+
+	if (!(file->f_mode & FMODE_READ))
+		goto out;
+
+	ret = -EINVAL;
+	if (!file->f_op || (!file->f_op->aio_read && !file->f_op->read))
+		goto out;
+
+	ret = compat_do_readv_writev(READ, file, vec, vlen, &pos);
+
+out:
+	if (ret > 0)
+		add_rchar(current, ret);
+	inc_syscr(current);
+	fput(file);
+	return ret;
+}
+
+asmlinkage ssize_t
+compat_sys_pwritev(unsigned long fd, const struct compat_iovec __user *vec,
+                   unsigned long vlen, u32 pos_high, u32 pos_low)
+{
+        loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+	struct file *file;
+	ssize_t ret = -EBADF;
+
+	if (pos < 0)
+		return -EINVAL;
+
+	file = fget(fd);
+	if (!file)
+		return -EBADF;
+	if (!(file->f_mode & FMODE_WRITE))
+		goto out;
+
+	ret = -EINVAL;
+	if (!file->f_op || (!file->f_op->aio_write && !file->f_op->write))
+		goto out;
+
+	ret = compat_do_readv_writev(WRITE, file, vec, vlen, &pos);
+
+out:
+	if (ret > 0)
+		add_wchar(current, ret);
+	inc_syscw(current);
+	fput(file);
+	return ret;
+}
+
 asmlinkage long
 compat_sys_vmsplice(int fd, const struct compat_iovec __user *iov32,
 		    unsigned int nr_segs, unsigned int flags)
diff --git a/fs/read_write.c b/fs/read_write.c
index 969a6d9..27779d0 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -701,6 +701,56 @@ sys_writev(unsigned long fd, const struct iovec __user *vec, unsigned long vlen)
 	return ret;
 }
 
+asmlinkage ssize_t sys_preadv(unsigned long fd, const struct iovec __user *vec,
+                              unsigned long vlen, u32 pos_high, u32 pos_low)
+{
+        loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+	struct file *file;
+	ssize_t ret = -EBADF;
+	int fput_needed;
+
+	if (pos < 0)
+		return -EINVAL;
+
+	file = fget_light(fd, &fput_needed);
+	if (file) {
+		ret = -ESPIPE;
+		if (file->f_mode & FMODE_PREAD)
+			ret = vfs_readv(file, vec, vlen, &pos);
+		fput_light(file, fput_needed);
+	}
+
+	if (ret > 0)
+		add_rchar(current, ret);
+	inc_syscr(current);
+	return ret;
+}
+
+asmlinkage ssize_t sys_pwritev(unsigned long fd, const struct iovec __user *vec,
+                               unsigned long vlen, u32 pos_high, u32 pos_low)
+{
+        loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+	struct file *file;
+	ssize_t ret = -EBADF;
+	int fput_needed;
+
+	if (pos < 0)
+		return -EINVAL;
+
+	file = fget_light(fd, &fput_needed);
+	if (file) {
+		ret = -ESPIPE;
+		if (file->f_mode & FMODE_PWRITE)
+			ret = vfs_writev(file, vec, vlen, &pos);
+		fput_light(file, fput_needed);
+	}
+
+	if (ret > 0)
+		add_wchar(current, ret);
+	inc_syscw(current);
+	return ret;
+}
+
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 			   size_t count, loff_t max)
 {
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v4 3/3] MIPS: Add preadv(2) and pwritev(2) syscalls.
  2008-12-18 11:42 [PATCH v4 0/3] System call design: preadv & pwritev Gerd Hoffmann
  2008-12-18 11:42 ` [PATCH v4 1/3] Add missing accounting calls to compat_sys_{readv,writev} Gerd Hoffmann
  2008-12-18 11:42 ` [PATCH v4 2/3] Add preadv and pwritev system calls Gerd Hoffmann
@ 2008-12-18 11:42 ` Gerd Hoffmann
  2 siblings, 0 replies; 6+ messages in thread
From: Gerd Hoffmann @ 2008-12-18 11:42 UTC (permalink / raw)
  To: linux-kernel, linux-arch, linux-api; +Cc: aarcange, Ralf Baechle, Gerd Hoffmann

From: Ralf Baechle <ralf@linux-mips.org>

From: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 arch/mips/include/asm/unistd.h |   18 ++++++++++++------
 arch/mips/kernel/scall32-o32.S |    2 ++
 arch/mips/kernel/scall64-64.S  |    2 ++
 arch/mips/kernel/scall64-n32.S |    2 ++
 arch/mips/kernel/scall64-o32.S |    2 ++
 5 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h
index a73e153..4000501 100644
--- a/arch/mips/include/asm/unistd.h
+++ b/arch/mips/include/asm/unistd.h
@@ -350,16 +350,18 @@
 #define __NR_dup3			(__NR_Linux + 327)
 #define __NR_pipe2			(__NR_Linux + 328)
 #define __NR_inotify_init1		(__NR_Linux + 329)
+#define __NR_preadv			(__NR_Linux + 330)
+#define __NR_pwritev			(__NR_Linux + 331)
 
 /*
  * Offset of the last Linux o32 flavoured syscall
  */
-#define __NR_Linux_syscalls		329
+#define __NR_Linux_syscalls		331
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
 
 #define __NR_O32_Linux			4000
-#define __NR_O32_Linux_syscalls		329
+#define __NR_O32_Linux_syscalls		331
 
 #if _MIPS_SIM == _MIPS_SIM_ABI64
 
@@ -656,16 +658,18 @@
 #define __NR_dup3			(__NR_Linux + 286)
 #define __NR_pipe2			(__NR_Linux + 287)
 #define __NR_inotify_init1		(__NR_Linux + 288)
+#define __NR_preadv			(__NR_Linux + 289)
+#define __NR_pwritev			(__NR_Linux + 290)
 
 /*
  * Offset of the last Linux 64-bit flavoured syscall
  */
-#define __NR_Linux_syscalls		288
+#define __NR_Linux_syscalls		290
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
 
 #define __NR_64_Linux			5000
-#define __NR_64_Linux_syscalls		288
+#define __NR_64_Linux_syscalls		290
 
 #if _MIPS_SIM == _MIPS_SIM_NABI32
 
@@ -966,16 +970,18 @@
 #define __NR_dup3			(__NR_Linux + 290)
 #define __NR_pipe2			(__NR_Linux + 291)
 #define __NR_inotify_init1		(__NR_Linux + 292)
+#define __NR_preadv			(__NR_Linux + 293)
+#define __NR_pwritev			(__NR_Linux + 294)
 
 /*
  * Offset of the last N32 flavoured syscall
  */
-#define __NR_Linux_syscalls		292
+#define __NR_Linux_syscalls		294
 
 #endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
 
 #define __NR_N32_Linux			6000
-#define __NR_N32_Linux_syscalls		292
+#define __NR_N32_Linux_syscalls		294
 
 #ifdef __KERNEL__
 
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index d0916a5..4a8b3e8 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -650,6 +650,8 @@ einval:	li	v0, -ENOSYS
 	sys	sys_dup3		3
 	sys	sys_pipe2		2
 	sys	sys_inotify_init1	1
+	sys	sys_preadv		6	/* 4330 */
+	sys	sys_pwritev		6
 	.endm
 
 	/* We pre-compute the number of _instruction_ bytes needed to
diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S
index a9e1716..217e3ce 100644
--- a/arch/mips/kernel/scall64-64.S
+++ b/arch/mips/kernel/scall64-64.S
@@ -487,4 +487,6 @@ sys_call_table:
 	PTR	sys_dup3
 	PTR	sys_pipe2
 	PTR	sys_inotify_init1
+	PTR	sys_preadv
+	PTR	sys_pwritev			/* 5390 */
 	.size	sys_call_table,.-sys_call_table
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index 30f3b63..f340963 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -413,4 +413,6 @@ EXPORT(sysn32_call_table)
 	PTR	sys_dup3			/* 5290 */
 	PTR	sys_pipe2
 	PTR	sys_inotify_init1
+	PTR	sys_preadv
+	PTR	sys_pwritev
 	.size	sysn32_call_table,.-sysn32_call_table
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index fefef4a..b1d281a 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -533,4 +533,6 @@ sys_call_table:
 	PTR	sys_dup3
 	PTR	sys_pipe2
 	PTR	sys_inotify_init1
+	PTR	compat_sys_preadv		/* 4330 */
+	PTR	compat_sys_pwritev
 	.size	sys_call_table,.-sys_call_table
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 2/3] Add preadv and pwritev system calls.
  2008-12-18 11:42 ` [PATCH v4 2/3] Add preadv and pwritev system calls Gerd Hoffmann
@ 2008-12-18 12:34   ` Arnd Bergmann
  2008-12-18 13:37     ` Gerd Hoffmann
  0 siblings, 1 reply; 6+ messages in thread
From: Arnd Bergmann @ 2008-12-18 12:34 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: linux-kernel, linux-arch, linux-api, aarcange

On Thursday 18 December 2008, Gerd Hoffmann wrote:
> This prototype has one problem though:  On 32bit archs is the (64bit)
> offset argument unaligned, which the syscall ABI of several archs
> doesn't allow to do.  At least s390 needs a wrapper in glibc to handle
> this.  As we'll need a wrappers in glibc anyway I've decided to push
> problem to glibc entriely and use a syscall prototype which works
> without arch-specific wrappers inside the kernel:  The offset argument
> is explicitly splitted into two 32bit values.

Obviously, the interface looks good to me now.

Please remember to add the function prototypes to
include/linux/syscalls.h.

> +asmlinkage ssize_t
> +compat_sys_preadv(unsigned long fd, const struct compat_iovec __user *vec,
> +                  unsigned long vlen, u32 pos_high, u32 pos_low)
> +{
> +        loff_t pos = ((loff_t)pos_high << 32) | pos_low;

This is whitespace damaged, as are the other four functions in the
same place.

> +	struct file *file;
> +	ssize_t ret = -EBADF;
> +
> +	if (pos < 0)
> +		return -EINVAL;
> +
> +	file = fget(fd);
> +	if (!file)
> +		return -EBADF;

Any reason for using fget() here, but fget_light() in sys_preadv?

> +	if (!(file->f_mode & FMODE_READ))
> +		goto out;
> +
> +	ret = -EINVAL;
> +	if (!file->f_op || (!file->f_op->aio_read && !file->f_op->read))
> +		goto out;

Maybe this logic could get moved into a compat_readv() function,
similar to vfs_readv(). The advantage would be that we can make
the native and compat paths more similar.

	Arnd <><

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4 2/3] Add preadv and pwritev system calls.
  2008-12-18 12:34   ` Arnd Bergmann
@ 2008-12-18 13:37     ` Gerd Hoffmann
  0 siblings, 0 replies; 6+ messages in thread
From: Gerd Hoffmann @ 2008-12-18 13:37 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linux-kernel, linux-arch, linux-api, aarcange

  Hi,

> Please remember to add the function prototypes to
> include/linux/syscalls.h.

Done.

>> +asmlinkage ssize_t
>> +compat_sys_preadv(unsigned long fd, const struct compat_iovec __user *vec,
>> +                  unsigned long vlen, u32 pos_high, u32 pos_low)
>> +{.
>> +        loff_t pos = ((loff_t)pos_high << 32) | pos_low;
> 
> This is whitespace damaged, as are the other four functions in the
> same place.

Fixed.

>> +	struct file *file;
>> +	ssize_t ret = -EBADF;
>> +
>> +	if (pos < 0)
>> +		return -EINVAL;
>> +
>> +	file = fget(fd);
>> +	if (!file)
>> +		return -EBADF;
> 
> Any reason for using fget() here, but fget_light() in sys_preadv?

Due to the new syscalls being derived from the readv()/preadv()
versions, which have the same inconsistency.  Dunno why.  I'd suspect
the reason is that the compat syscalls tend to be overlooked when the
code is updated.  Proof: patch #1 in this series ...

I'll switch them all over in a separate followup patch.

>> +	if (!(file->f_mode & FMODE_READ))
>> +		goto out;
>> +
>> +	ret = -EINVAL;
>> +	if (!file->f_op || (!file->f_op->aio_read && !file->f_op->read))
>> +		goto out;
> 
> Maybe this logic could get moved into a compat_readv() function,
> similar to vfs_readv().

Yes, makes sense.

cheers,
  Gerd



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-12-18 13:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-18 11:42 [PATCH v4 0/3] System call design: preadv & pwritev Gerd Hoffmann
2008-12-18 11:42 ` [PATCH v4 1/3] Add missing accounting calls to compat_sys_{readv,writev} Gerd Hoffmann
2008-12-18 11:42 ` [PATCH v4 2/3] Add preadv and pwritev system calls Gerd Hoffmann
2008-12-18 12:34   ` Arnd Bergmann
2008-12-18 13:37     ` Gerd Hoffmann
2008-12-18 11:42 ` [PATCH v4 3/3] MIPS: Add preadv(2) and pwritev(2) syscalls Gerd Hoffmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox