Re: negative seek offsets in VFS

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: negative seek offsets in VFS
       [not found] <s29588e0.089@sinclair.provo.novell.com>
@ 2005-05-26 17:49 ` Bryan Henderson
  2005-05-26 19:23   ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Bryan Henderson @ 2005-05-26 17:49 UTC (permalink / raw)
  To: Paul Taysom; +Cc: ak, linux-fsdevel, viro

>The addresses returned from /proc/kallsyms on the x86_64 are negative and 
when
>I print the address of a kernel variable with "%p" it comes out negative.

Wow.  So can someone explain this?  Does the x86_64 architecture actually 
have a concept of negative addresses?

Ordinarily, I'd think this was a bug if I saw it, but I don't see how %p 
could format a minus sign without it being deliberate.

In any case, I know POSIX doesn't have a concept of a negative absolute 
file offset, so a kmem device cannot represent a negative address as a 
negative file offset.  A negative loff_t means something else.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-26 17:49 ` negative seek offsets in VFS Bryan Henderson
@ 2005-05-26 19:23   ` Andi Kleen
  2005-05-26 21:17     ` Bryan Henderson
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2005-05-26 19:23 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: Paul Taysom, linux-fsdevel, viro

On Thu, May 26, 2005 at 10:49:57AM -0700, Bryan Henderson wrote:
> >The addresses returned from /proc/kallsyms on the x86_64 are negative and 
> when
> >I print the address of a kernel variable with "%p" it comes out negative.
> 
> Wow.  So can someone explain this?  Does the x86_64 architecture actually 
> have a concept of negative addresses?

Yes, it has. It has a 48bit address space, and the 48th bit 
is sign extended to 64bits. So you have a big hole in the middle
and positive and negative address spaces. The kernel uses the negative
half, user space the positive half.

> 
> Ordinarily, I'd think this was a bug if I saw it, but I don't see how %p 
> could format a minus sign without it being deliberate.

It doesnt, but they are still negative. 
 
> In any case, I know POSIX doesn't have a concept of a negative absolute 
> file offset, so a kmem device cannot represent a negative address as a 
> negative file offset.  A negative loff_t means something else.

I dont think POSIX has anything to say about /dev/kmem.

-Andi
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-26 19:23   ` Andi Kleen
@ 2005-05-26 21:17     ` Bryan Henderson
  2005-05-27 10:43       ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Bryan Henderson @ 2005-05-26 21:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-fsdevel, Paul Taysom, viro

>I dont think POSIX has anything to say about /dev/kmem.

Not directly, but since the entire point of /dev/kmem is to make kernel 
memory accessible via the POSIX file interface, POSIX has plenty to say 
about it.

In architectures with unsigned addresses, there's an obvious way to 
implement /dev/kmem -- consider the memory address to be the file offset.

I don't see how you can do that with negative memory addresses.  POSIX 
doesn't have negative file offsets.  There is nothing before the beginning 
of a file.  How does a POSIX program read address -1 via /dev/kmem?

>if you allowed a negative offset and just 
>> declared that it stands for the large positive offset you'd get if you 
>> coerced it to an unsigned 64 bit integer, then how would you tell a 
>> success from a failure in the return code?
>
>The same as any other Linux kernel interfaces do it.   The range from 
-4096 to -1
>is reserved for error codes, the rest is free to use.

I'm familiar with the ERR_PTR strategy for packing a kernel address and an 
errno into a single word, but I haven't seen this for anything else.  I'm 
talking about the POSIX interface -- i.e. what the caller of the lseek() C 
library routine would see.

I know we're looking at interfaces below the POSIX interface, but I don't 
see how recognizing negative file offsets at this level is possible 
without also recognizing them at the POSIX level.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-26 21:17     ` Bryan Henderson
@ 2005-05-27 10:43       ` Andi Kleen
  2005-05-27 18:39         ` Bryan Henderson
  2005-05-28 12:37         ` Jamie Lokier
  0 siblings, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2005-05-27 10:43 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-fsdevel, Paul Taysom, viro

> I'm familiar with the ERR_PTR strategy for packing a kernel address and an 
> errno into a single word, but I haven't seen this for anything else.  I'm 
> talking about the POSIX interface -- i.e. what the caller of the lseek() C 
> library routine would see.

I am not talking about kernel internal interfaces like ERR_PTR.
The linux system call interface on most (all?) architectures
reserves -1 to -4095 for error returns. When such a error is detected
it is converted to errno and -1. This applies to all system
calls.

Take a look at unistd.h of your favourite architecture if you
dont believe me.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-27 10:43       ` Andi Kleen
@ 2005-05-27 18:39         ` Bryan Henderson
  2005-05-28 12:41           ` Jamie Lokier
  2005-05-30  9:36           ` Andi Kleen
  2005-05-28 12:37         ` Jamie Lokier
  1 sibling, 2 replies; 21+ messages in thread
From: Bryan Henderson @ 2005-05-27 18:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-fsdevel, Paul Taysom, viro

>I am not talking about kernel internal interfaces like ERR_PTR.
>The linux system call interface on most (all?) architectures
>reserves -1 to -4095 for error returns. When such a error is detected
>it is converted to errno and -1. This applies to all system
>calls.

OK, I remember that now.  I probably forgot it because I couldn't think of 
any use for that special range of values more negative than -4095.  Can 
you give an example of a system call that returns integers less than 
-4095?

And separately, I'm still confused about how you expect /dev/kmem file 
offsets to work.  While I was mistaken about the existence of negative 
addresses, I know there are no negative file offsets in POSIX.  So how 
does one look at a negative address via the POSIX file interface and 
/dev/kmem?

Even if you add the concept of negative file offsets as a glibc/Linux 
extension of POSIX, you have an ambiguity problem at least with offset -1, 
right?

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems 




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-27 18:39         ` Bryan Henderson
@ 2005-05-28 12:41           ` Jamie Lokier
  2005-05-31 18:08             ` Bryan Henderson
  2005-05-30  9:36           ` Andi Kleen
  1 sibling, 1 reply; 21+ messages in thread
From: Jamie Lokier @ 2005-05-28 12:41 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: Andi Kleen, linux-fsdevel, Paul Taysom, viro

Bryan Henderson wrote:
> OK, I remember that now.  I probably forgot it because I couldn't think of 
> any use for that special range of values more negative than -4095.  Can 
> you give an example of a system call that returns integers less than 
> -4095?

fcntl(fd, F_GETOWN, 0) can.

See fs/fcntl.c.

So can ptrace() on some architectures.

-- Jamie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-28 12:41           ` Jamie Lokier
@ 2005-05-31 18:08             ` Bryan Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Bryan Henderson @ 2005-05-31 18:08 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Andi Kleen, linux-fsdevel, Paul Taysom, viro

>Bryan Henderson wrote:
>> OK, I remember that now.  I probably forgot it because I couldn't think 
of 
>> any use for that special range of values more negative than -4095.  Can 

>> you give an example of a system call that returns integers less than 
>> -4095?
>
>fcntl(fd, F_GETOWN, 0) can.

Thanks.  Actually, I was looking for an example that returns integers -1 - 
-4095 in the usual way (=errno) and < -4095 meaning something else, but 
this is even better.

So does fcntl() work on i386?  From what I can see in the code, a -9 
return from a fcntl(F_GETOWN) system call is ambiguous.  It could mean 
process group 9 or "bad file descriptor."  Is that so?  If so, is libc 
able to resolve the ambiguity, or is the fcntl() libc call broken in this 
arch?

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-27 18:39         ` Bryan Henderson
  2005-05-28 12:41           ` Jamie Lokier
@ 2005-05-30  9:36           ` Andi Kleen
  2005-05-31 18:33             ` Bryan Henderson
  1 sibling, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2005-05-30  9:36 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-fsdevel, Paul Taysom, viro

On Fri, May 27, 2005 at 11:39:40AM -0700, Bryan Henderson wrote:
> OK, I remember that now.  I probably forgot it because I couldn't think of 
> any use for that special range of values more negative than -4095.  Can 
> you give an example of a system call that returns integers less than 
> -4095?

Various ioctls for example can return arbitary values.

> 
> And separately, I'm still confused about how you expect /dev/kmem file 
> offsets to work.  While I was mistaken about the existence of negative 
> addresses, I know there are no negative file offsets in POSIX.  So how 
> does one look at a negative address via the POSIX file interface and 
> /dev/kmem?

pread,pwrite,lseek(...,SEEK_SET) should all work.

/dev/kmem is firmly outside POSIX, so this is fine.
For other devices the behaviour will not change.

> 
> Even if you add the concept of negative file offsets as a glibc/Linux 
> extension of POSIX, you have an ambiguity problem at least with offset -1, 
> right?

The kernel doesn't use this range as address, so in practice
the problem does not occur.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-30  9:36           ` Andi Kleen
@ 2005-05-31 18:33             ` Bryan Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Bryan Henderson @ 2005-05-31 18:33 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-fsdevel, Paul Taysom, viro

>pread,pwrite,lseek(...,SEEK_SET) should all work.
>
>/dev/kmem is firmly outside POSIX, so this is fine.
>For other devices the behaviour will not change.
>
>> Even if you add the concept of negative file offsets as a glibc/Linux 
>> extension of POSIX, you have an ambiguity problem at least with offset 
-1, 
>> right?
>
>The kernel doesn't use this range as address, so in practice
>the problem does not occur.

OK, I get it.

But it appears that my answer to your original question (why is there code 
in VFS to reject a negative file offset argument) is right: that code was 
designed for a POSIX interface, and putting the check there relieved all 
the drivers for POSIX files from some of the responsibility of 
implementing POSIX.

>From what I see on the mailing list, people today are willing to open up 
the VFS layer to allow some quasi-POSIX file access.

Frankly, I'd much rather see /dev/kmem stick to the file metaphor and use 
some scheme that doesn't involve negative "file offsets."  There is 
probably a lot of generic file access code, now and in the future, that 
would appreciate it.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-27 10:43       ` Andi Kleen
  2005-05-27 18:39         ` Bryan Henderson
@ 2005-05-28 12:37         ` Jamie Lokier
  2005-05-30  9:32           ` Andi Kleen
  1 sibling, 1 reply; 21+ messages in thread
From: Jamie Lokier @ 2005-05-28 12:37 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Bryan Henderson, linux-fsdevel, Paul Taysom, viro

Andi Kleen wrote:
> The linux system call interface on most (all?) architectures

Actually that's not true on many architectures, including most of the
64-bit ones.  See (e.g.) the Alpha, MIPS32, MIPS64, IA64, PPC32 and
PPC64 versions of <asm/unistd.h>, which use a separate register to
indicate an error return.

That's the reason for the `force_successful_syscall_return' macro,
defined in <linux/ptrace.h>:

	/*
	 * System call handlers that, upon successful completion, need
	 * to return a negative value should call
	 * force_successful_syscall_return() right before returning.
	 * On architectures where the syscall convention provides for
	 * a separate error flag (e.g., alpha, ia64, ppc{,64},
	 * sparc{,64}, possibly others), this macro can be used to
	 * ensure that the error flag will not get set.  On
	 * architectures which do not support a separate error flag,
	 * the macro is a no-op and the spurious error condition needs
	 * to be filtered out by some other means (e.g., in
	 * user-level, by passing an extra argument to the syscall
	 * handler, or something along those lines).
	 */

> reserves -1 to -4095 for error returns. When such a error is detected
> it is converted to errno and -1. This applies to all system
> calls.
> 
> Take a look at unistd.h of your favourite architecture if you
> dont believe me.

Most unistd.h's are wrong by now, as they don't test against -4095.

Example, from <asm-x86_64/unistd.h>:

	if ((unsigned long)(res) >= (unsigned long)(-127)) { \
		errno = -(res); \
		res = -1; \
	} \
	return (type) (res); \

>From <asm-x86_64/errno.h>:

	#include <asm-generic/errno.h>

>From <asm-generic/errno.h>:

	#define	ECANCELED	125	/* Operation Canceled */
	#define	ENOKEY		126	/* Required key not available */
	#define	EKEYEXPIRED	127	/* Key has expired */
	#define	EKEYREVOKED	128	/* Key has been revoked */
	#define	EKEYREJECTED	129	/* Key was rejected by service */

Spot the inconsistency :)

x86_64 tests against -127; SH64 against -125; etc. most of the ones I
looked at are wrong, they're different from each other (despite using
<asm-generic/errno.h>), and most of them have a comment preceding the
test which states yet another different wrong number.

Example, from <asm-s390/unistd.h>:

	/* user-visible error numbers are in the range -1 - -122:
	   see <asm-s390/errno.h> */

	#define __syscall_return(type, res)			     \
	do {							     \
		if ((unsigned long)(res) >= (unsigned long)(-125)) { \

-- Jamie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-28 12:37         ` Jamie Lokier
@ 2005-05-30  9:32           ` Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2005-05-30  9:32 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Bryan Henderson, linux-fsdevel, Paul Taysom, viro

On Sat, May 28, 2005 at 01:37:31PM +0100, Jamie Lokier wrote:
> Andi Kleen wrote:
> > The linux system call interface on most (all?) architectures
> 
> Actually that's not true on many architectures, including most of the
> 64-bit ones.  See (e.g.) the Alpha, MIPS32, MIPS64, IA64, PPC32 and
> PPC64 versions of <asm/unistd.h>, which use a separate register to
> indicate an error return.
> 
> That's the reason for the `force_successful_syscall_return' macro,
> defined in <linux/ptrace.h>:
> 
> 	/*
> 	 * System call handlers that, upon successful completion, need
> 	 * to return a negative value should call
> 	 * force_successful_syscall_return() right before returning.
> 	 * On architectures where the syscall convention provides for
> 	 * a separate error flag (e.g., alpha, ia64, ppc{,64},
> 	 * sparc{,64}, possibly others), this macro can be used to
> 	 * ensure that the error flag will not get set.  On
> 	 * architectures which do not support a separate error flag,
> 	 * the macro is a no-op and the spurious error condition needs
> 	 * to be filtered out by some other means (e.g., in
> 	 * user-level, by passing an extra argument to the syscall
> 	 * handler, or something along those lines).
> 	 */
> 
> > reserves -1 to -4095 for error returns. When such a error is detected
> > it is converted to errno and -1. This applies to all system
> > calls.
> > 
> > Take a look at unistd.h of your favourite architecture if you
> > dont believe me.
> 
> Most unistd.h's are wrong by now, as they don't test against -4095.

You are right, but at least glibc tests against -4095.

I will fix x86-64.

> 
> >From <asm-generic/errno.h>:
> 
> 	#define	ECANCELED	125	/* Operation Canceled */
> 	#define	ENOKEY		126	/* Required key not available */
> 	#define	EKEYEXPIRED	127	/* Key has expired */
> 	#define	EKEYREVOKED	128	/* Key has been revoked */
> 	#define	EKEYREJECTED	129	/* Key was rejected by service */
> 
> Spot the inconsistency :)

Very nasty indeed.  But fortunately near nobody uses the unistd.h
macros anyways.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
@ 2005-05-26 14:29 Paul Taysom
  0 siblings, 0 replies; 21+ messages in thread
From: Paul Taysom @ 2005-05-26 14:29 UTC (permalink / raw)
  To: ak, hbryan; +Cc: linux-fsdevel, viro

The addresses returned from /proc/kallsyms on the x86_64 are negative and when
I print the address of a kernel variable with "%p" it comes out negative.

Paul Taysom

>>> Bryan Henderson <hbryan@us.ibm.com> 5/25/2005 6:56:26 PM >>>
>My x86-64 users are complaining again that they cannot reach kernel
>text addresses in /dev/kmem. The reason is that they are negative and
>the the VFS read and seek code just EINVALs them.

Come now -- the kernel addresses are not negative, and neither is any file 
offset.

You apparently mean that when you coerce a kernel address which exceeds 
the range of a file offset type into a file offset type, it comes out 
negative.

>I dont quite get why they are there anyways, the super block has 
>max file size field and checking against that should be enough for
>all the filesystems, no?

But this isn't about exceeding a maximum file size -- it's about exceeding 
the range of offsets that is representable in this C data type.

So I guess the real question is why is the loff_t type signed, thereby 
making it incapable of representing sufficiently large offsets?  The 
answer is that there are POSIX interfaces that overload a single data 
structure as both a file offset or size and a status code.  If a loff_t 
value is positive, it is a file offset, but if it's negative, it's a 
status code.  Consider lseek -- if you allowed a negative offset and just 
declared that it stands for the large positive offset you'd get if you 
coerced it to an unsigned 64 bit integer, then how would you tell a 
success from a failure in the return code?

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org 
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* negative seek offsets in VFS
@ 2005-05-25 16:39 Andi Kleen
  2005-05-25 16:56 ` Trond Myklebust
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Andi Kleen @ 2005-05-25 16:39 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel

My x86-64 users are complaining again that they cannot reach kernel
text addresses in /dev/kmem. The reason is that they are negative and
the the VFS read and seek code just EINVALs them. For seek I could
fix it in drivers/char/mem.c, but for read/pread/write etc.
it needs VFS changes.

I dont quite get why they are there anyways, the super block has 
max file size field and checking against that should be enough for
all the filesystems, no?

Opinions?

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-25 16:39 Andi Kleen
@ 2005-05-25 16:56 ` Trond Myklebust
  2005-05-25 18:48   ` Andi Kleen
  2005-05-26  0:56 ` Bryan Henderson
  2005-05-26 15:15 ` Al Viro
  2 siblings, 1 reply; 21+ messages in thread
From: Trond Myklebust @ 2005-05-25 16:56 UTC (permalink / raw)
  To: Andi Kleen; +Cc: viro, linux-fsdevel

on den 25.05.2005 Klokka 18:39 (+0200) skreiv Andi Kleen:
> My x86-64 users are complaining again that they cannot reach kernel
> text addresses in /dev/kmem. The reason is that they are negative and
> the the VFS read and seek code just EINVALs them. For seek I could
> fix it in drivers/char/mem.c, but for read/pread/write etc.
> it needs VFS changes.
> 
> I dont quite get why they are there anyways, the super block has 
> max file size field and checking against that should be enough for
> all the filesystems, no?

Isn't /dev/kmem overriding the default llseek()?

AFAICS, drivers/char/mem.c defines "memory_lseek()" for precisely the
above reason.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-25 16:56 ` Trond Myklebust
@ 2005-05-25 18:48   ` Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2005-05-25 18:48 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: viro, linux-fsdevel

On Wed, May 25, 2005 at 12:56:44PM -0400, Trond Myklebust wrote:
> on den 25.05.2005 Klokka 18:39 (+0200) skreiv Andi Kleen:
> > My x86-64 users are complaining again that they cannot reach kernel
> > text addresses in /dev/kmem. The reason is that they are negative and
> > the the VFS read and seek code just EINVALs them. For seek I could
> > fix it in drivers/char/mem.c, but for read/pread/write etc.
> > it needs VFS changes.
> > 
> > I dont quite get why they are there anyways, the super block has 
> > max file size field and checking against that should be enough for
> > all the filesystems, no?
> 
> Isn't /dev/kmem overriding the default llseek()?
> 
> AFAICS, drivers/char/mem.c defines "memory_lseek()" for precisely the
> above reason.

Yes, but that is not enough. read/write have these checks too,
even pread/pwrite.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-25 16:39 Andi Kleen
  2005-05-25 16:56 ` Trond Myklebust
@ 2005-05-26  0:56 ` Bryan Henderson
  2005-05-26 19:20   ` Andi Kleen
  2005-05-26 15:15 ` Al Viro
  2 siblings, 1 reply; 21+ messages in thread
From: Bryan Henderson @ 2005-05-26  0:56 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-fsdevel, viro

>My x86-64 users are complaining again that they cannot reach kernel
>text addresses in /dev/kmem. The reason is that they are negative and
>the the VFS read and seek code just EINVALs them.

Come now -- the kernel addresses are not negative, and neither is any file 
offset.

You apparently mean that when you coerce a kernel address which exceeds 
the range of a file offset type into a file offset type, it comes out 
negative.

>I dont quite get why they are there anyways, the super block has 
>max file size field and checking against that should be enough for
>all the filesystems, no?

But this isn't about exceeding a maximum file size -- it's about exceeding 
the range of offsets that is representable in this C data type.

So I guess the real question is why is the loff_t type signed, thereby 
making it incapable of representing sufficiently large offsets?  The 
answer is that there are POSIX interfaces that overload a single data 
structure as both a file offset or size and a status code.  If a loff_t 
value is positive, it is a file offset, but if it's negative, it's a 
status code.  Consider lseek -- if you allowed a negative offset and just 
declared that it stands for the large positive offset you'd get if you 
coerced it to an unsigned 64 bit integer, then how would you tell a 
success from a failure in the return code?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-26  0:56 ` Bryan Henderson
@ 2005-05-26 19:20   ` Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2005-05-26 19:20 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-fsdevel, viro

On Wed, May 25, 2005 at 05:56:26PM -0700, Bryan Henderson wrote:
> >My x86-64 users are complaining again that they cannot reach kernel
> >text addresses in /dev/kmem. The reason is that they are negative and
> >the the VFS read and seek code just EINVALs them.
> 
> Come now -- the kernel addresses are not negative, and neither is any file 
> offset.

x86-64 addresses are negative.

> 
> You apparently mean that when you coerce a kernel address which exceeds 
> the range of a file offset type into a file offset type, it comes out 
> negative.

No I meant what I wrote.

> So I guess the real question is why is the loff_t type signed, thereby 
> making it incapable of representing sufficiently large offsets?  The 
> answer is that there are POSIX interfaces that overload a single data 
> structure as both a file offset or size and a status code.  If a loff_t 
> value is positive, it is a file offset, but if it's negative, it's a 
> status code.  Consider lseek -- if you allowed a negative offset and just 
> declared that it stands for the large positive offset you'd get if you 
> coerced it to an unsigned 64 bit integer, then how would you tell a 
> success from a failure in the return code?

The same as any other Linux kernel interfaces do it. The range from -4096 to -1
is reserved for error codes, the rest is free to use.
This is what glibc uses too. It has nothing to do with POSIX.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-25 16:39 Andi Kleen
  2005-05-25 16:56 ` Trond Myklebust
  2005-05-26  0:56 ` Bryan Henderson
@ 2005-05-26 15:15 ` Al Viro
  2005-05-26 15:29   ` Linus Torvalds
  2 siblings, 1 reply; 21+ messages in thread
From: Al Viro @ 2005-05-26 15:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: viro, linux-fsdevel, Linus Torvalds

On Wed, May 25, 2005 at 06:39:05PM +0200, Andi Kleen wrote:
> 
> My x86-64 users are complaining again that they cannot reach kernel
> text addresses in /dev/kmem. The reason is that they are negative and
> the the VFS read and seek code just EINVALs them. For seek I could
> fix it in drivers/char/mem.c, but for read/pread/write etc.
> it needs VFS changes.
> 
> I dont quite get why they are there anyways, the super block has 
> max file size field and checking against that should be enough for
> all the filesystems, no?

Most of the really bad cases are not in filesystems - devices tend to
be much more broken.  So no, check for max size doesn't help here.

We could add a check in
        if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
                goto Einval;
for "no, it's really OK to do that for this file" without harming the
fast path.

How about
        if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
		if (!(file->f_mode & FMODE_ANY_OFFSET))
			goto Einval;

instead + adding
#define FMODE_ANY_OFFSET 16 /* we don't need any offset checks */
in fs.h + having kmem ->open() set it?

Linus, it's your code.  Do you have any objections to the above?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-26 15:15 ` Al Viro
@ 2005-05-26 15:29   ` Linus Torvalds
  2005-05-26 19:25     ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2005-05-26 15:29 UTC (permalink / raw)
  To: Al Viro; +Cc: Andi Kleen, viro, linux-fsdevel



On Thu, 26 May 2005, Al Viro wrote:
> 
> How about
>         if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> 		if (!(file->f_mode & FMODE_ANY_OFFSET))
> 			goto Einval;
> 
> instead + adding
> #define FMODE_ANY_OFFSET 16 /* we don't need any offset checks */
> in fs.h + having kmem ->open() set it?
> 
> Linus, it's your code.  Do you have any objections to the above?

No. But if so, the test should be changed: right now it not only detects 
negative offsets, it also detects wrap-around. And I think that 
wrap-around is _always_ wrong.

I don't think there is a "positive loff_t", so I guess the code would have 
to be something like

	/* Wraparound? */
	if ((u64)(pos + count) < count)
		goto Einval;

	if ((loff_t) (pos + count) < 0)
		if (!(file->f_mode & FMODE_ANY_OFFSET))
			goto Einval;

instead.

			Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-26 15:29   ` Linus Torvalds
@ 2005-05-26 19:25     ` Andi Kleen
  2005-05-26 19:39       ` Linus Torvalds
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2005-05-26 19:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, viro, linux-fsdevel

> I don't think there is a "positive loff_t", so I guess the code would have 
> to be something like
> 
> 	/* Wraparound? */
> 	if ((u64)(pos + count) < count)
> 		goto Einval;

Sometimes I think we should have a nice inline asm macro for that
that checks carry. 

> 
> 	if ((loff_t) (pos + count) < 0)
> 		if (!(file->f_mode & FMODE_ANY_OFFSET))
> 			goto Einval;

Looks good. But how to handle the broken devices viro worries about? 
I would prefer not to open any new security holes, but it is a bit too
much code to audit all.  Flag would be possible, but ugly.

Any other choice?

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: negative seek offsets in VFS
  2005-05-26 19:25     ` Andi Kleen
@ 2005-05-26 19:39       ` Linus Torvalds
  0 siblings, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2005-05-26 19:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Al Viro, viro, linux-fsdevel



On Thu, 26 May 2005, Andi Kleen wrote:
>
> > I don't think there is a "positive loff_t", so I guess the code would have 
> > to be something like
> > 
> > 	/* Wraparound? */
> > 	if ((u64)(pos + count) < count)
> > 		goto Einval;
> 
> Sometimes I think we should have a nice inline asm macro for that
> that checks carry. 

Well, we'd need one in every size, and it's not common enough for us to 
care about performance, so..

> > 	if ((loff_t) (pos + count) < 0)
> > 		if (!(file->f_mode & FMODE_ANY_OFFSET))
> > 			goto Einval;
> 
> Looks good. But how to handle the broken devices viro worries about? 
> I would prefer not to open any new security holes, but it is a bit too
> much code to audit all.  Flag would be possible, but ugly.

I would _only_ ever set that FMODE_ANY_OFFSET for /dev/mem or other 
devices that are known to be ok. IOW, this is not something that ever gets 
set by default, and the user cannot set it.

		Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2005-05-31 18:33 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <s29588e0.089@sinclair.provo.novell.com>
2005-05-26 17:49 ` negative seek offsets in VFS Bryan Henderson
2005-05-26 19:23   ` Andi Kleen
2005-05-26 21:17     ` Bryan Henderson
2005-05-27 10:43       ` Andi Kleen
2005-05-27 18:39         ` Bryan Henderson
2005-05-28 12:41           ` Jamie Lokier
2005-05-31 18:08             ` Bryan Henderson
2005-05-30  9:36           ` Andi Kleen
2005-05-31 18:33             ` Bryan Henderson
2005-05-28 12:37         ` Jamie Lokier
2005-05-30  9:32           ` Andi Kleen
2005-05-26 14:29 Paul Taysom
  -- strict thread matches above, loose matches on Subject: below --
2005-05-25 16:39 Andi Kleen
2005-05-25 16:56 ` Trond Myklebust
2005-05-25 18:48   ` Andi Kleen
2005-05-26  0:56 ` Bryan Henderson
2005-05-26 19:20   ` Andi Kleen
2005-05-26 15:15 ` Al Viro
2005-05-26 15:29   ` Linus Torvalds
2005-05-26 19:25     ` Andi Kleen
2005-05-26 19:39       ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).