* Re: negative seek offsets in VFS
[not found] <s29588e0.089@sinclair.provo.novell.com>
@ 2005-05-26 17:49 ` Bryan Henderson
2005-05-26 19:23 ` Andi Kleen
0 siblings, 1 reply; 21+ messages in thread
From: Bryan Henderson @ 2005-05-26 17:49 UTC (permalink / raw)
To: Paul Taysom; +Cc: ak, linux-fsdevel, viro
>The addresses returned from /proc/kallsyms on the x86_64 are negative and
when
>I print the address of a kernel variable with "%p" it comes out negative.
Wow. So can someone explain this? Does the x86_64 architecture actually
have a concept of negative addresses?
Ordinarily, I'd think this was a bug if I saw it, but I don't see how %p
could format a minus sign without it being deliberate.
In any case, I know POSIX doesn't have a concept of a negative absolute
file offset, so a kmem device cannot represent a negative address as a
negative file offset. A negative loff_t means something else.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: negative seek offsets in VFS
2005-05-26 17:49 ` negative seek offsets in VFS Bryan Henderson
@ 2005-05-26 19:23 ` Andi Kleen
2005-05-26 21:17 ` Bryan Henderson
0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2005-05-26 19:23 UTC (permalink / raw)
To: Bryan Henderson; +Cc: Paul Taysom, linux-fsdevel, viro
On Thu, May 26, 2005 at 10:49:57AM -0700, Bryan Henderson wrote:
> >The addresses returned from /proc/kallsyms on the x86_64 are negative and
> when
> >I print the address of a kernel variable with "%p" it comes out negative.
>
> Wow. So can someone explain this? Does the x86_64 architecture actually
> have a concept of negative addresses?
Yes, it has. It has a 48bit address space, and the 48th bit
is sign extended to 64bits. So you have a big hole in the middle
and positive and negative address spaces. The kernel uses the negative
half, user space the positive half.
>
> Ordinarily, I'd think this was a bug if I saw it, but I don't see how %p
> could format a minus sign without it being deliberate.
It doesnt, but they are still negative.
> In any case, I know POSIX doesn't have a concept of a negative absolute
> file offset, so a kmem device cannot represent a negative address as a
> negative file offset. A negative loff_t means something else.
I dont think POSIX has anything to say about /dev/kmem.
-Andi
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-26 19:23 ` Andi Kleen
@ 2005-05-26 21:17 ` Bryan Henderson
2005-05-27 10:43 ` Andi Kleen
0 siblings, 1 reply; 21+ messages in thread
From: Bryan Henderson @ 2005-05-26 21:17 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-fsdevel, Paul Taysom, viro
>I dont think POSIX has anything to say about /dev/kmem.
Not directly, but since the entire point of /dev/kmem is to make kernel
memory accessible via the POSIX file interface, POSIX has plenty to say
about it.
In architectures with unsigned addresses, there's an obvious way to
implement /dev/kmem -- consider the memory address to be the file offset.
I don't see how you can do that with negative memory addresses. POSIX
doesn't have negative file offsets. There is nothing before the beginning
of a file. How does a POSIX program read address -1 via /dev/kmem?
>if you allowed a negative offset and just
>> declared that it stands for the large positive offset you'd get if you
>> coerced it to an unsigned 64 bit integer, then how would you tell a
>> success from a failure in the return code?
>
>The same as any other Linux kernel interfaces do it. The range from
-4096 to -1
>is reserved for error codes, the rest is free to use.
I'm familiar with the ERR_PTR strategy for packing a kernel address and an
errno into a single word, but I haven't seen this for anything else. I'm
talking about the POSIX interface -- i.e. what the caller of the lseek() C
library routine would see.
I know we're looking at interfaces below the POSIX interface, but I don't
see how recognizing negative file offsets at this level is possible
without also recognizing them at the POSIX level.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-26 21:17 ` Bryan Henderson
@ 2005-05-27 10:43 ` Andi Kleen
2005-05-27 18:39 ` Bryan Henderson
2005-05-28 12:37 ` Jamie Lokier
0 siblings, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2005-05-27 10:43 UTC (permalink / raw)
To: Bryan Henderson; +Cc: linux-fsdevel, Paul Taysom, viro
> I'm familiar with the ERR_PTR strategy for packing a kernel address and an
> errno into a single word, but I haven't seen this for anything else. I'm
> talking about the POSIX interface -- i.e. what the caller of the lseek() C
> library routine would see.
I am not talking about kernel internal interfaces like ERR_PTR.
The linux system call interface on most (all?) architectures
reserves -1 to -4095 for error returns. When such a error is detected
it is converted to errno and -1. This applies to all system
calls.
Take a look at unistd.h of your favourite architecture if you
dont believe me.
-Andi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-27 10:43 ` Andi Kleen
@ 2005-05-27 18:39 ` Bryan Henderson
2005-05-28 12:41 ` Jamie Lokier
2005-05-30 9:36 ` Andi Kleen
2005-05-28 12:37 ` Jamie Lokier
1 sibling, 2 replies; 21+ messages in thread
From: Bryan Henderson @ 2005-05-27 18:39 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-fsdevel, Paul Taysom, viro
>I am not talking about kernel internal interfaces like ERR_PTR.
>The linux system call interface on most (all?) architectures
>reserves -1 to -4095 for error returns. When such a error is detected
>it is converted to errno and -1. This applies to all system
>calls.
OK, I remember that now. I probably forgot it because I couldn't think of
any use for that special range of values more negative than -4095. Can
you give an example of a system call that returns integers less than
-4095?
And separately, I'm still confused about how you expect /dev/kmem file
offsets to work. While I was mistaken about the existence of negative
addresses, I know there are no negative file offsets in POSIX. So how
does one look at a negative address via the POSIX file interface and
/dev/kmem?
Even if you add the concept of negative file offsets as a glibc/Linux
extension of POSIX, you have an ambiguity problem at least with offset -1,
right?
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-27 18:39 ` Bryan Henderson
@ 2005-05-28 12:41 ` Jamie Lokier
2005-05-31 18:08 ` Bryan Henderson
2005-05-30 9:36 ` Andi Kleen
1 sibling, 1 reply; 21+ messages in thread
From: Jamie Lokier @ 2005-05-28 12:41 UTC (permalink / raw)
To: Bryan Henderson; +Cc: Andi Kleen, linux-fsdevel, Paul Taysom, viro
Bryan Henderson wrote:
> OK, I remember that now. I probably forgot it because I couldn't think of
> any use for that special range of values more negative than -4095. Can
> you give an example of a system call that returns integers less than
> -4095?
fcntl(fd, F_GETOWN, 0) can.
See fs/fcntl.c.
So can ptrace() on some architectures.
-- Jamie
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-28 12:41 ` Jamie Lokier
@ 2005-05-31 18:08 ` Bryan Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Bryan Henderson @ 2005-05-31 18:08 UTC (permalink / raw)
To: Jamie Lokier; +Cc: Andi Kleen, linux-fsdevel, Paul Taysom, viro
>Bryan Henderson wrote:
>> OK, I remember that now. I probably forgot it because I couldn't think
of
>> any use for that special range of values more negative than -4095. Can
>> you give an example of a system call that returns integers less than
>> -4095?
>
>fcntl(fd, F_GETOWN, 0) can.
Thanks. Actually, I was looking for an example that returns integers -1 -
-4095 in the usual way (=errno) and < -4095 meaning something else, but
this is even better.
So does fcntl() work on i386? From what I can see in the code, a -9
return from a fcntl(F_GETOWN) system call is ambiguous. It could mean
process group 9 or "bad file descriptor." Is that so? If so, is libc
able to resolve the ambiguity, or is the fcntl() libc call broken in this
arch?
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-27 18:39 ` Bryan Henderson
2005-05-28 12:41 ` Jamie Lokier
@ 2005-05-30 9:36 ` Andi Kleen
2005-05-31 18:33 ` Bryan Henderson
1 sibling, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2005-05-30 9:36 UTC (permalink / raw)
To: Bryan Henderson; +Cc: linux-fsdevel, Paul Taysom, viro
On Fri, May 27, 2005 at 11:39:40AM -0700, Bryan Henderson wrote:
> OK, I remember that now. I probably forgot it because I couldn't think of
> any use for that special range of values more negative than -4095. Can
> you give an example of a system call that returns integers less than
> -4095?
Various ioctls for example can return arbitary values.
>
> And separately, I'm still confused about how you expect /dev/kmem file
> offsets to work. While I was mistaken about the existence of negative
> addresses, I know there are no negative file offsets in POSIX. So how
> does one look at a negative address via the POSIX file interface and
> /dev/kmem?
pread,pwrite,lseek(...,SEEK_SET) should all work.
/dev/kmem is firmly outside POSIX, so this is fine.
For other devices the behaviour will not change.
>
> Even if you add the concept of negative file offsets as a glibc/Linux
> extension of POSIX, you have an ambiguity problem at least with offset -1,
> right?
The kernel doesn't use this range as address, so in practice
the problem does not occur.
-Andi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-30 9:36 ` Andi Kleen
@ 2005-05-31 18:33 ` Bryan Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Bryan Henderson @ 2005-05-31 18:33 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-fsdevel, Paul Taysom, viro
>pread,pwrite,lseek(...,SEEK_SET) should all work.
>
>/dev/kmem is firmly outside POSIX, so this is fine.
>For other devices the behaviour will not change.
>
>> Even if you add the concept of negative file offsets as a glibc/Linux
>> extension of POSIX, you have an ambiguity problem at least with offset
-1,
>> right?
>
>The kernel doesn't use this range as address, so in practice
>the problem does not occur.
OK, I get it.
But it appears that my answer to your original question (why is there code
in VFS to reject a negative file offset argument) is right: that code was
designed for a POSIX interface, and putting the check there relieved all
the drivers for POSIX files from some of the responsibility of
implementing POSIX.
>From what I see on the mailing list, people today are willing to open up
the VFS layer to allow some quasi-POSIX file access.
Frankly, I'd much rather see /dev/kmem stick to the file metaphor and use
some scheme that doesn't involve negative "file offsets." There is
probably a lot of generic file access code, now and in the future, that
would appreciate it.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-27 10:43 ` Andi Kleen
2005-05-27 18:39 ` Bryan Henderson
@ 2005-05-28 12:37 ` Jamie Lokier
2005-05-30 9:32 ` Andi Kleen
1 sibling, 1 reply; 21+ messages in thread
From: Jamie Lokier @ 2005-05-28 12:37 UTC (permalink / raw)
To: Andi Kleen; +Cc: Bryan Henderson, linux-fsdevel, Paul Taysom, viro
Andi Kleen wrote:
> The linux system call interface on most (all?) architectures
Actually that's not true on many architectures, including most of the
64-bit ones. See (e.g.) the Alpha, MIPS32, MIPS64, IA64, PPC32 and
PPC64 versions of <asm/unistd.h>, which use a separate register to
indicate an error return.
That's the reason for the `force_successful_syscall_return' macro,
defined in <linux/ptrace.h>:
/*
* System call handlers that, upon successful completion, need
* to return a negative value should call
* force_successful_syscall_return() right before returning.
* On architectures where the syscall convention provides for
* a separate error flag (e.g., alpha, ia64, ppc{,64},
* sparc{,64}, possibly others), this macro can be used to
* ensure that the error flag will not get set. On
* architectures which do not support a separate error flag,
* the macro is a no-op and the spurious error condition needs
* to be filtered out by some other means (e.g., in
* user-level, by passing an extra argument to the syscall
* handler, or something along those lines).
*/
> reserves -1 to -4095 for error returns. When such a error is detected
> it is converted to errno and -1. This applies to all system
> calls.
>
> Take a look at unistd.h of your favourite architecture if you
> dont believe me.
Most unistd.h's are wrong by now, as they don't test against -4095.
Example, from <asm-x86_64/unistd.h>:
if ((unsigned long)(res) >= (unsigned long)(-127)) { \
errno = -(res); \
res = -1; \
} \
return (type) (res); \
>From <asm-x86_64/errno.h>:
#include <asm-generic/errno.h>
>From <asm-generic/errno.h>:
#define ECANCELED 125 /* Operation Canceled */
#define ENOKEY 126 /* Required key not available */
#define EKEYEXPIRED 127 /* Key has expired */
#define EKEYREVOKED 128 /* Key has been revoked */
#define EKEYREJECTED 129 /* Key was rejected by service */
Spot the inconsistency :)
x86_64 tests against -127; SH64 against -125; etc. most of the ones I
looked at are wrong, they're different from each other (despite using
<asm-generic/errno.h>), and most of them have a comment preceding the
test which states yet another different wrong number.
Example, from <asm-s390/unistd.h>:
/* user-visible error numbers are in the range -1 - -122:
see <asm-s390/errno.h> */
#define __syscall_return(type, res) \
do { \
if ((unsigned long)(res) >= (unsigned long)(-125)) { \
-- Jamie
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: negative seek offsets in VFS
2005-05-28 12:37 ` Jamie Lokier
@ 2005-05-30 9:32 ` Andi Kleen
0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2005-05-30 9:32 UTC (permalink / raw)
To: Jamie Lokier; +Cc: Bryan Henderson, linux-fsdevel, Paul Taysom, viro
On Sat, May 28, 2005 at 01:37:31PM +0100, Jamie Lokier wrote:
> Andi Kleen wrote:
> > The linux system call interface on most (all?) architectures
>
> Actually that's not true on many architectures, including most of the
> 64-bit ones. See (e.g.) the Alpha, MIPS32, MIPS64, IA64, PPC32 and
> PPC64 versions of <asm/unistd.h>, which use a separate register to
> indicate an error return.
>
> That's the reason for the `force_successful_syscall_return' macro,
> defined in <linux/ptrace.h>:
>
> /*
> * System call handlers that, upon successful completion, need
> * to return a negative value should call
> * force_successful_syscall_return() right before returning.
> * On architectures where the syscall convention provides for
> * a separate error flag (e.g., alpha, ia64, ppc{,64},
> * sparc{,64}, possibly others), this macro can be used to
> * ensure that the error flag will not get set. On
> * architectures which do not support a separate error flag,
> * the macro is a no-op and the spurious error condition needs
> * to be filtered out by some other means (e.g., in
> * user-level, by passing an extra argument to the syscall
> * handler, or something along those lines).
> */
>
> > reserves -1 to -4095 for error returns. When such a error is detected
> > it is converted to errno and -1. This applies to all system
> > calls.
> >
> > Take a look at unistd.h of your favourite architecture if you
> > dont believe me.
>
> Most unistd.h's are wrong by now, as they don't test against -4095.
You are right, but at least glibc tests against -4095.
I will fix x86-64.
>
> >From <asm-generic/errno.h>:
>
> #define ECANCELED 125 /* Operation Canceled */
> #define ENOKEY 126 /* Required key not available */
> #define EKEYEXPIRED 127 /* Key has expired */
> #define EKEYREVOKED 128 /* Key has been revoked */
> #define EKEYREJECTED 129 /* Key was rejected by service */
>
> Spot the inconsistency :)
Very nasty indeed. But fortunately near nobody uses the unistd.h
macros anyways.
-Andi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
@ 2005-05-26 14:29 Paul Taysom
0 siblings, 0 replies; 21+ messages in thread
From: Paul Taysom @ 2005-05-26 14:29 UTC (permalink / raw)
To: ak, hbryan; +Cc: linux-fsdevel, viro
The addresses returned from /proc/kallsyms on the x86_64 are negative and when
I print the address of a kernel variable with "%p" it comes out negative.
Paul Taysom
>>> Bryan Henderson <hbryan@us.ibm.com> 5/25/2005 6:56:26 PM >>>
>My x86-64 users are complaining again that they cannot reach kernel
>text addresses in /dev/kmem. The reason is that they are negative and
>the the VFS read and seek code just EINVALs them.
Come now -- the kernel addresses are not negative, and neither is any file
offset.
You apparently mean that when you coerce a kernel address which exceeds
the range of a file offset type into a file offset type, it comes out
negative.
>I dont quite get why they are there anyways, the super block has
>max file size field and checking against that should be enough for
>all the filesystems, no?
But this isn't about exceeding a maximum file size -- it's about exceeding
the range of offsets that is representable in this C data type.
So I guess the real question is why is the loff_t type signed, thereby
making it incapable of representing sufficiently large offsets? The
answer is that there are POSIX interfaces that overload a single data
structure as both a file offset or size and a status code. If a loff_t
value is positive, it is a file offset, but if it's negative, it's a
status code. Consider lseek -- if you allowed a negative offset and just
declared that it stands for the large positive offset you'd get if you
coerced it to an unsigned 64 bit integer, then how would you tell a
success from a failure in the return code?
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 21+ messages in thread
* negative seek offsets in VFS
@ 2005-05-25 16:39 Andi Kleen
2005-05-25 16:56 ` Trond Myklebust
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Andi Kleen @ 2005-05-25 16:39 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel
My x86-64 users are complaining again that they cannot reach kernel
text addresses in /dev/kmem. The reason is that they are negative and
the the VFS read and seek code just EINVALs them. For seek I could
fix it in drivers/char/mem.c, but for read/pread/write etc.
it needs VFS changes.
I dont quite get why they are there anyways, the super block has
max file size field and checking against that should be enough for
all the filesystems, no?
Opinions?
-Andi
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: negative seek offsets in VFS
2005-05-25 16:39 Andi Kleen
@ 2005-05-25 16:56 ` Trond Myklebust
2005-05-25 18:48 ` Andi Kleen
2005-05-26 0:56 ` Bryan Henderson
2005-05-26 15:15 ` Al Viro
2 siblings, 1 reply; 21+ messages in thread
From: Trond Myklebust @ 2005-05-25 16:56 UTC (permalink / raw)
To: Andi Kleen; +Cc: viro, linux-fsdevel
on den 25.05.2005 Klokka 18:39 (+0200) skreiv Andi Kleen:
> My x86-64 users are complaining again that they cannot reach kernel
> text addresses in /dev/kmem. The reason is that they are negative and
> the the VFS read and seek code just EINVALs them. For seek I could
> fix it in drivers/char/mem.c, but for read/pread/write etc.
> it needs VFS changes.
>
> I dont quite get why they are there anyways, the super block has
> max file size field and checking against that should be enough for
> all the filesystems, no?
Isn't /dev/kmem overriding the default llseek()?
AFAICS, drivers/char/mem.c defines "memory_lseek()" for precisely the
above reason.
Cheers,
Trond
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-25 16:56 ` Trond Myklebust
@ 2005-05-25 18:48 ` Andi Kleen
0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2005-05-25 18:48 UTC (permalink / raw)
To: Trond Myklebust; +Cc: viro, linux-fsdevel
On Wed, May 25, 2005 at 12:56:44PM -0400, Trond Myklebust wrote:
> on den 25.05.2005 Klokka 18:39 (+0200) skreiv Andi Kleen:
> > My x86-64 users are complaining again that they cannot reach kernel
> > text addresses in /dev/kmem. The reason is that they are negative and
> > the the VFS read and seek code just EINVALs them. For seek I could
> > fix it in drivers/char/mem.c, but for read/pread/write etc.
> > it needs VFS changes.
> >
> > I dont quite get why they are there anyways, the super block has
> > max file size field and checking against that should be enough for
> > all the filesystems, no?
>
> Isn't /dev/kmem overriding the default llseek()?
>
> AFAICS, drivers/char/mem.c defines "memory_lseek()" for precisely the
> above reason.
Yes, but that is not enough. read/write have these checks too,
even pread/pwrite.
-Andi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-25 16:39 Andi Kleen
2005-05-25 16:56 ` Trond Myklebust
@ 2005-05-26 0:56 ` Bryan Henderson
2005-05-26 19:20 ` Andi Kleen
2005-05-26 15:15 ` Al Viro
2 siblings, 1 reply; 21+ messages in thread
From: Bryan Henderson @ 2005-05-26 0:56 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-fsdevel, viro
>My x86-64 users are complaining again that they cannot reach kernel
>text addresses in /dev/kmem. The reason is that they are negative and
>the the VFS read and seek code just EINVALs them.
Come now -- the kernel addresses are not negative, and neither is any file
offset.
You apparently mean that when you coerce a kernel address which exceeds
the range of a file offset type into a file offset type, it comes out
negative.
>I dont quite get why they are there anyways, the super block has
>max file size field and checking against that should be enough for
>all the filesystems, no?
But this isn't about exceeding a maximum file size -- it's about exceeding
the range of offsets that is representable in this C data type.
So I guess the real question is why is the loff_t type signed, thereby
making it incapable of representing sufficiently large offsets? The
answer is that there are POSIX interfaces that overload a single data
structure as both a file offset or size and a status code. If a loff_t
value is positive, it is a file offset, but if it's negative, it's a
status code. Consider lseek -- if you allowed a negative offset and just
declared that it stands for the large positive offset you'd get if you
coerced it to an unsigned 64 bit integer, then how would you tell a
success from a failure in the return code?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-26 0:56 ` Bryan Henderson
@ 2005-05-26 19:20 ` Andi Kleen
0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2005-05-26 19:20 UTC (permalink / raw)
To: Bryan Henderson; +Cc: linux-fsdevel, viro
On Wed, May 25, 2005 at 05:56:26PM -0700, Bryan Henderson wrote:
> >My x86-64 users are complaining again that they cannot reach kernel
> >text addresses in /dev/kmem. The reason is that they are negative and
> >the the VFS read and seek code just EINVALs them.
>
> Come now -- the kernel addresses are not negative, and neither is any file
> offset.
x86-64 addresses are negative.
>
> You apparently mean that when you coerce a kernel address which exceeds
> the range of a file offset type into a file offset type, it comes out
> negative.
No I meant what I wrote.
> So I guess the real question is why is the loff_t type signed, thereby
> making it incapable of representing sufficiently large offsets? The
> answer is that there are POSIX interfaces that overload a single data
> structure as both a file offset or size and a status code. If a loff_t
> value is positive, it is a file offset, but if it's negative, it's a
> status code. Consider lseek -- if you allowed a negative offset and just
> declared that it stands for the large positive offset you'd get if you
> coerced it to an unsigned 64 bit integer, then how would you tell a
> success from a failure in the return code?
The same as any other Linux kernel interfaces do it. The range from -4096 to -1
is reserved for error codes, the rest is free to use.
This is what glibc uses too. It has nothing to do with POSIX.
-Andi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-25 16:39 Andi Kleen
2005-05-25 16:56 ` Trond Myklebust
2005-05-26 0:56 ` Bryan Henderson
@ 2005-05-26 15:15 ` Al Viro
2005-05-26 15:29 ` Linus Torvalds
2 siblings, 1 reply; 21+ messages in thread
From: Al Viro @ 2005-05-26 15:15 UTC (permalink / raw)
To: Andi Kleen; +Cc: viro, linux-fsdevel, Linus Torvalds
On Wed, May 25, 2005 at 06:39:05PM +0200, Andi Kleen wrote:
>
> My x86-64 users are complaining again that they cannot reach kernel
> text addresses in /dev/kmem. The reason is that they are negative and
> the the VFS read and seek code just EINVALs them. For seek I could
> fix it in drivers/char/mem.c, but for read/pread/write etc.
> it needs VFS changes.
>
> I dont quite get why they are there anyways, the super block has
> max file size field and checking against that should be enough for
> all the filesystems, no?
Most of the really bad cases are not in filesystems - devices tend to
be much more broken. So no, check for max size doesn't help here.
We could add a check in
if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
goto Einval;
for "no, it's really OK to do that for this file" without harming the
fast path.
How about
if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
if (!(file->f_mode & FMODE_ANY_OFFSET))
goto Einval;
instead + adding
#define FMODE_ANY_OFFSET 16 /* we don't need any offset checks */
in fs.h + having kmem ->open() set it?
Linus, it's your code. Do you have any objections to the above?
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: negative seek offsets in VFS
2005-05-26 15:15 ` Al Viro
@ 2005-05-26 15:29 ` Linus Torvalds
2005-05-26 19:25 ` Andi Kleen
0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2005-05-26 15:29 UTC (permalink / raw)
To: Al Viro; +Cc: Andi Kleen, viro, linux-fsdevel
On Thu, 26 May 2005, Al Viro wrote:
>
> How about
> if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> if (!(file->f_mode & FMODE_ANY_OFFSET))
> goto Einval;
>
> instead + adding
> #define FMODE_ANY_OFFSET 16 /* we don't need any offset checks */
> in fs.h + having kmem ->open() set it?
>
> Linus, it's your code. Do you have any objections to the above?
No. But if so, the test should be changed: right now it not only detects
negative offsets, it also detects wrap-around. And I think that
wrap-around is _always_ wrong.
I don't think there is a "positive loff_t", so I guess the code would have
to be something like
/* Wraparound? */
if ((u64)(pos + count) < count)
goto Einval;
if ((loff_t) (pos + count) < 0)
if (!(file->f_mode & FMODE_ANY_OFFSET))
goto Einval;
instead.
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-26 15:29 ` Linus Torvalds
@ 2005-05-26 19:25 ` Andi Kleen
2005-05-26 19:39 ` Linus Torvalds
0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2005-05-26 19:25 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Al Viro, viro, linux-fsdevel
> I don't think there is a "positive loff_t", so I guess the code would have
> to be something like
>
> /* Wraparound? */
> if ((u64)(pos + count) < count)
> goto Einval;
Sometimes I think we should have a nice inline asm macro for that
that checks carry.
>
> if ((loff_t) (pos + count) < 0)
> if (!(file->f_mode & FMODE_ANY_OFFSET))
> goto Einval;
Looks good. But how to handle the broken devices viro worries about?
I would prefer not to open any new security holes, but it is a bit too
much code to audit all. Flag would be possible, but ugly.
Any other choice?
-Andi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: negative seek offsets in VFS
2005-05-26 19:25 ` Andi Kleen
@ 2005-05-26 19:39 ` Linus Torvalds
0 siblings, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2005-05-26 19:39 UTC (permalink / raw)
To: Andi Kleen; +Cc: Al Viro, viro, linux-fsdevel
On Thu, 26 May 2005, Andi Kleen wrote:
>
> > I don't think there is a "positive loff_t", so I guess the code would have
> > to be something like
> >
> > /* Wraparound? */
> > if ((u64)(pos + count) < count)
> > goto Einval;
>
> Sometimes I think we should have a nice inline asm macro for that
> that checks carry.
Well, we'd need one in every size, and it's not common enough for us to
care about performance, so..
> > if ((loff_t) (pos + count) < 0)
> > if (!(file->f_mode & FMODE_ANY_OFFSET))
> > goto Einval;
>
> Looks good. But how to handle the broken devices viro worries about?
> I would prefer not to open any new security holes, but it is a bit too
> much code to audit all. Flag would be possible, but ugly.
I would _only_ ever set that FMODE_ANY_OFFSET for /dev/mem or other
devices that are known to be ok. IOW, this is not something that ever gets
set by default, and the user cannot set it.
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2005-05-31 18:33 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <s29588e0.089@sinclair.provo.novell.com>
2005-05-26 17:49 ` negative seek offsets in VFS Bryan Henderson
2005-05-26 19:23 ` Andi Kleen
2005-05-26 21:17 ` Bryan Henderson
2005-05-27 10:43 ` Andi Kleen
2005-05-27 18:39 ` Bryan Henderson
2005-05-28 12:41 ` Jamie Lokier
2005-05-31 18:08 ` Bryan Henderson
2005-05-30 9:36 ` Andi Kleen
2005-05-31 18:33 ` Bryan Henderson
2005-05-28 12:37 ` Jamie Lokier
2005-05-30 9:32 ` Andi Kleen
2005-05-26 14:29 Paul Taysom
-- strict thread matches above, loose matches on Subject: below --
2005-05-25 16:39 Andi Kleen
2005-05-25 16:56 ` Trond Myklebust
2005-05-25 18:48 ` Andi Kleen
2005-05-26 0:56 ` Bryan Henderson
2005-05-26 19:20 ` Andi Kleen
2005-05-26 15:15 ` Al Viro
2005-05-26 15:29 ` Linus Torvalds
2005-05-26 19:25 ` Andi Kleen
2005-05-26 19:39 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).