public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Limit sendfile() to 2^31-PAGE_CACHE_SIZE bytes without error
       [not found]     ` <43BB5646.2040504@zytor.com>
@ 2006-01-04  5:33       ` H. Peter Anvin
  2006-01-04 17:02         ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: H. Peter Anvin @ 2006-01-04  5:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, Ulrich Drepper, Andi Kleen, Viro, Al,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 820 bytes --]

sendfile() has a limit of 2^31-1 bytes even on 64-bit platforms.  Linus 
wants to maintain it in order to avoid potential future security bugs 
(always a good thing.)

This patch changes the behaviour from returning EINVAL when this limit 
is exceeded to returning a short count.  This means that a 
properly-written userspace program will simply loop around and continue; 
  it will expose bugs in improperly-written userspace programs, which is 
also a good thing.  Additionally, the limit becomes an issue that is 
completely contained within the kernel, and not encoded in the kernel 
ABI, so it can be changed in the future.

(I set the limit to 2^31-PAGE_CACHE_SIZE so that a transfer that starts 
at the beginning of the file will continue to be page-aligned.)

The

Signed-off-by: H. Peter Anvin <hpa@zytor.com>

[-- Attachment #2: sendfile.patch --]
[-- Type: text/x-patch, Size: 658 bytes --]

diff --git a/fs/read_write.c b/fs/read_write.c
index a091ee4..3712886 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -9,6 +9,7 @@
 #include <linux/fcntl.h>
 #include <linux/file.h>
 #include <linux/uio.h>
+#include <linux/pagemap.h>
 #include <linux/smp_lock.h>
 #include <linux/fsnotify.h>
 #include <linux/security.h>
@@ -631,6 +632,9 @@ static ssize_t do_sendfile(int out_fd, i
 	ssize_t retval;
 	int fput_needed_in, fput_needed_out;
 
+	/* Avoid potential security holes.  User space will get a short count and should loop. */
+	count = min(count, (size_t)0x80000000-PAGE_CACHE_SIZE);
+
 	/*
 	 * Get input file, and verify that it is ok..
 	 */

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Limit sendfile() to 2^31-PAGE_CACHE_SIZE bytes without error
  2006-01-04  5:33       ` [PATCH] Limit sendfile() to 2^31-PAGE_CACHE_SIZE bytes without error H. Peter Anvin
@ 2006-01-04 17:02         ` Linus Torvalds
  2006-01-04 17:16           ` Ulrich Drepper
  2006-01-04 18:40           ` Linus Torvalds
  0 siblings, 2 replies; 7+ messages in thread
From: Linus Torvalds @ 2006-01-04 17:02 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ulrich Drepper, Andi Kleen, Viro, Al, linux-kernel



On Tue, 3 Jan 2006, H. Peter Anvin wrote:
> 
> (I set the limit to 2^31-PAGE_CACHE_SIZE so that a transfer that starts at the
> beginning of the file will continue to be page-aligned.)

Ok, this patch looks ok, if it's confirmed to unbreak apache.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Limit sendfile() to 2^31-PAGE_CACHE_SIZE bytes without error
  2006-01-04 17:02         ` Linus Torvalds
@ 2006-01-04 17:16           ` Ulrich Drepper
  2006-01-04 18:41             ` H. Peter Anvin
  2006-01-04 18:40           ` Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Ulrich Drepper @ 2006-01-04 17:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Andi Kleen, Viro, Al, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 421 bytes --]

Linus Torvalds wrote:
> Ok, this patch looks ok, if it's confirmed to unbreak apache.

Yes, sounds reasonable.  But I don't think that, as Peter suggested in
another mail, that glibc should automatically wrap the syscall to
provide support for  larger sizes.  The caller probably should know when
a transfer was cut short.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Limit sendfile() to 2^31-PAGE_CACHE_SIZE bytes without error
  2006-01-04 17:02         ` Linus Torvalds
  2006-01-04 17:16           ` Ulrich Drepper
@ 2006-01-04 18:40           ` Linus Torvalds
  2006-01-04 18:58             ` H. Peter Anvin
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2006-01-04 18:40 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ulrich Drepper, Andi Kleen, Viro, Al, linux-kernel



On Wed, 4 Jan 2006, Linus Torvalds wrote:
> 
> On Tue, 3 Jan 2006, H. Peter Anvin wrote:
> > 
> > (I set the limit to 2^31-PAGE_CACHE_SIZE so that a transfer that starts at the
> > beginning of the file will continue to be page-aligned.)
> 
> Ok, this patch looks ok, if it's confirmed to unbreak apache.

Actually, looking closer, this patch does the wrong thing for a size_t 
that is negative in ssize_t (which is technically "undefined behaviour" in 
POSIX, but turning it into a big positive number is objectively worse than 
returning -EINVAL).

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Limit sendfile() to 2^31-PAGE_CACHE_SIZE bytes without error
  2006-01-04 17:16           ` Ulrich Drepper
@ 2006-01-04 18:41             ` H. Peter Anvin
  0 siblings, 0 replies; 7+ messages in thread
From: H. Peter Anvin @ 2006-01-04 18:41 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Linus Torvalds, Andi Kleen, Viro, Al, linux-kernel

Ulrich Drepper wrote:
> Linus Torvalds wrote:
> 
>>Ok, this patch looks ok, if it's confirmed to unbreak apache.
> 
> Yes, sounds reasonable.  But I don't think that, as Peter suggested in
> another mail, that glibc should automatically wrap the syscall to
> provide support for  larger sizes.  The caller probably should know when
> a transfer was cut short.
> 

Agreed; that's exactly what a short return in supposed to do.  Since a 
short return is compatible with the defined API, that's not a problem.

However, this patch only addresses sendfile(), and this is apparently a 
much more pervasive problem.  We need something similar for all the I/O 
system calls that are affected by this particular issue.

	-hpa

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Limit sendfile() to 2^31-PAGE_CACHE_SIZE bytes without error
  2006-01-04 18:40           ` Linus Torvalds
@ 2006-01-04 18:58             ` H. Peter Anvin
  2006-01-04 19:28               ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: H. Peter Anvin @ 2006-01-04 18:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ulrich Drepper, Andi Kleen, Viro, Al, linux-kernel

Linus Torvalds wrote:
> 
> On Wed, 4 Jan 2006, Linus Torvalds wrote:
> 
>>On Tue, 3 Jan 2006, H. Peter Anvin wrote:
>>
>>>(I set the limit to 2^31-PAGE_CACHE_SIZE so that a transfer that starts at the
>>>beginning of the file will continue to be page-aligned.)
>>
>>Ok, this patch looks ok, if it's confirmed to unbreak apache.
> 
> Actually, looking closer, this patch does the wrong thing for a size_t 
> that is negative in ssize_t (which is technically "undefined behaviour" in 
> POSIX, but turning it into a big positive number is objectively worse than 
> returning -EINVAL).
> 

OK, that's a fair cop.  I agree.  In fact, for readv/writev(), POSIX 
does specify:

"If the sum of the iov_len values is greater than {SSIZE_MAX}, the 
operation shall fail and no data shall be tranferred."

... which is good precedence for doing so for all values.

So, what system calls are affected?  sendfile, [p]read[v], [p]write[v], 
send*, recv*, any others?

	-hpa

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Limit sendfile() to 2^31-PAGE_CACHE_SIZE bytes without error
  2006-01-04 18:58             ` H. Peter Anvin
@ 2006-01-04 19:28               ` Linus Torvalds
  0 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2006-01-04 19:28 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ulrich Drepper, Andi Kleen, Viro, Al, linux-kernel



On Wed, 4 Jan 2006, H. Peter Anvin wrote:
> 
> So, what system calls are affected?  sendfile, [p]read[v], [p]write[v], send*,
> recv*, any others?

Just {read|write}[v] and sendfile. With compat32 functions (although 
those, for obvious reasons, have just 32-bit counts anyway, so they have a 
natural limit from that).

The send*/recv* functions don't end up calling to lots of strange drivers 
(it will go through the network layer), so they don't have the same "limit 
damage from untrusted code" issues.

So here's a suggested patch.

It changes "rw_verify_area()" in a simple way:

 - it continues to return -EINVAL for truly invalid sizes (ie if the 
   "size_t" is negative in "ssize_t")
 - it returns the _truncated_ size for other cases (ie it will now return 
   a non-negative "suggested count" for success)

this means that the "hard limit" of INT_MAX is gone, and it's up to the 
caller to decide how it reacts to a positive value. A caller can choose to 
just ignore it ("I'll use my untruncated count, thank you") or can choose 
to take the truncated value.

In the first case, the only change is that instead of checking the return 
value against zero (any non-zero being error), the caller has to realize 
that positive values aren't error cases. So

	ret = rw_verify_area(...)
	if (ret)
		goto out;

turns into

	ret = rw_verify_area(...)
	if (ret < 0)
		goto out;

and in the second case, the caller just ends up adding a "count = ret" 
afterwards.

The rw_verify_area() function isn't exported to modules, and is used in 
only a few places, so this all seems pretty straightforward.

Untested, but mostly sane-looking patch appended. I do want to do 
something about the "readv|writev()" case, though, so it's not complete (I 
don't want to have readv/writev pass insane huge values down the stack to 
untrustworthy drivers..)

Comments? Can somebody test this with the apache thing that caused this 
to surface?

(And I'd like to stress that this really is untested. I've not even 
compile-tested it, so while it all looks very straightforward, typos and 
thinkos can abound)

		Linus

---
diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index 330cf84..60353f5 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -420,7 +420,7 @@ asmlinkage ssize_t sys32_pread(unsigned 
 		goto out;
 	pos = merge_64(a4, a5);
 	ret = rw_verify_area(READ, file, &pos, count);
-	if (ret)
+	if (ret < 0)
 		goto out;
 	ret = -EINVAL;
 	if (!file->f_op || !(read = file->f_op->read))
@@ -455,7 +455,7 @@ asmlinkage ssize_t sys32_pwrite(unsigned
 		goto out;
 	pos = merge_64(a4, a5);
 	ret = rw_verify_area(WRITE, file, &pos, count);
-	if (ret)
+	if (ret < 0)
 		goto out;
 	ret = -EINVAL;
 	if (!file->f_op || !(write = file->f_op->write))
diff --git a/fs/compat.c b/fs/compat.c
index 8186341..55ac032 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1170,7 +1170,7 @@ static ssize_t compat_do_readv_writev(in
 	}
 
 	ret = rw_verify_area(type, file, pos, tot_len);
-	if (ret)
+	if (ret < 0)
 		goto out;
 
 	fnv = NULL;
diff --git a/fs/read_write.c b/fs/read_write.c
index a091ee4..df3468a 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -14,6 +14,7 @@
 #include <linux/security.h>
 #include <linux/module.h>
 #include <linux/syscalls.h>
+#include <linux/pagemap.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
@@ -182,22 +183,33 @@ bad:
 }
 #endif
 
+/*
+ * rw_verify_area doesn't like huge counts. We limit
+ * them to something that fits in "int" so that others
+ * won't have to do range checks all the time.
+ */
+#define MAX_RW_COUNT (INT_MAX & PAGE_CACHE_MASK)
 
 int rw_verify_area(int read_write, struct file *file, loff_t *ppos, size_t count)
 {
 	struct inode *inode;
 	loff_t pos;
 
-	if (unlikely(count > INT_MAX))
+	if (unlikely((ssize_t) count < 0))
 		goto Einval;
 	pos = *ppos;
 	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
 		goto Einval;
 
 	inode = file->f_dentry->d_inode;
-	if (inode->i_flock && MANDATORY_LOCK(inode))
-		return locks_mandatory_area(read_write == READ ? FLOCK_VERIFY_READ : FLOCK_VERIFY_WRITE, inode, file, pos, count);
-	return 0;
+	if (inode->i_flock && MANDATORY_LOCK(inode)) {
+		int retval = locks_mandatory_area(
+			read_write == READ ? FLOCK_VERIFY_READ : FLOCK_VERIFY_WRITE,
+			inode, file, pos, count);
+		if (retval < 0)
+			return retval;
+	}
+	return count > MAX_RW_COUNT ? MAX_RW_COUNT : count;
 
 Einval:
 	return -EINVAL;
@@ -244,7 +256,8 @@ ssize_t vfs_read(struct file *file, char
 		return -EFAULT;
 
 	ret = rw_verify_area(READ, file, pos, count);
-	if (!ret) {
+	if (ret >= 0) {
+		count = ret;
 		ret = security_file_permission (file, MAY_READ);
 		if (!ret) {
 			if (file->f_op->read)
@@ -295,7 +308,8 @@ ssize_t vfs_write(struct file *file, con
 		return -EFAULT;
 
 	ret = rw_verify_area(WRITE, file, pos, count);
-	if (!ret) {
+	if (ret >= 0) {
+		count = ret;
 		ret = security_file_permission (file, MAY_WRITE);
 		if (!ret) {
 			if (file->f_op->write)
@@ -497,7 +511,7 @@ static ssize_t do_readv_writev(int type,
 	}
 
 	ret = rw_verify_area(type, file, pos, tot_len);
-	if (ret)
+	if (ret < 0)
 		goto out;
 	ret = security_file_permission(file, type == READ ? MAY_READ : MAY_WRITE);
 	if (ret)
@@ -653,8 +667,9 @@ static ssize_t do_sendfile(int out_fd, i
 		if (!(in_file->f_mode & FMODE_PREAD))
 			goto fput_in;
 	retval = rw_verify_area(READ, in_file, ppos, count);
-	if (retval)
+	if (retval < 0)
 		goto fput_in;
+	count = retval;
 
 	retval = security_file_permission (in_file, MAY_READ);
 	if (retval)
@@ -674,8 +689,9 @@ static ssize_t do_sendfile(int out_fd, i
 		goto fput_out;
 	out_inode = out_file->f_dentry->d_inode;
 	retval = rw_verify_area(WRITE, out_file, &out_file->f_pos, count);
-	if (retval)
+	if (retval < 0)
 		goto fput_out;
+	count = retval;
 
 	retval = security_file_permission (out_file, MAY_WRITE);
 	if (retval)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-01-04 19:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <43BB348F.3070108@zytor.com>
     [not found] ` <200601040451.20411.ak@suse.de>
     [not found]   ` <Pine.LNX.4.64.0601032051300.3668@g5.osdl.org>
     [not found]     ` <43BB5646.2040504@zytor.com>
2006-01-04  5:33       ` [PATCH] Limit sendfile() to 2^31-PAGE_CACHE_SIZE bytes without error H. Peter Anvin
2006-01-04 17:02         ` Linus Torvalds
2006-01-04 17:16           ` Ulrich Drepper
2006-01-04 18:41             ` H. Peter Anvin
2006-01-04 18:40           ` Linus Torvalds
2006-01-04 18:58             ` H. Peter Anvin
2006-01-04 19:28               ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox