* Re: Linux kernel file offset pointer races [not found] <Pine.LNX.4.44.0408041220550.26961-100000@isec.pl> @ 2004-08-04 20:36 ` Pavel Kankovsky 2004-08-06 12:56 ` Alan Cox 0 siblings, 1 reply; 8+ messages in thread From: Pavel Kankovsky @ 2004-08-04 20:36 UTC (permalink / raw) To: linux-kernel On Wed, 4 Aug 2004, Paul Starzetz wrote: > static ssize_t mtrr_read(struct file *file, char *buf, size_t len, > loff_t *ppos) > { > [1] if (*ppos >= ascii_buf_bytes) return 0; > [2] if (*ppos + len > ascii_buf_bytes) len = ascii_buf_bytes - *ppos; > if ( copy_to_user (buf, ascii_buffer + *ppos, len) ) return -EFAULT; > [3] *ppos += len; > return len; > } /* End Function mtrr_read */ > > It is quite easy to see that since copy_to_user can sleep, the second > reference to *ppos may use another value. Or in other words, code > operating on the file->f_pos variable through a pointer must be atomic > in respect to the current thread. We expect even more troubles in the > SMP case though. Oh no! Another old bug getting bored of being ignored and promoting itself into a security vulnerability (see Bugtraq/Full-Disclosure archive for a complete story). IMHO, the proper fix is to serialize all operations modifying a shared file pointer (file->f_pos): read(), readv(), write(), writev(), lseek()/llseek(). As far as I can tell, this is required by POSIX: [read() specification] On files that support seeking (for example, a regular file), the read() shall start at a position in the file given by the file offset associated with fildes. The file offset shall be incremented by the number of bytes actually read. [write() specification] On a regular file or other file capable of seeking, the actual writing of data shall proceed from the position in the file indicated by the file offset associated with fildes. Before successful return from write(), the file offset shall be incremented by the number of bytes actually written. [write() specification] If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation. Moreover, there should probably be some kind of serialization between groups of reads from a file (read: inode) and individual writes to the same file even among different struct file: [read() rationale] I/O is intended to be atomic to ordinary files and pipes and FIFOs. Atomic means that all the bytes from a single operation that started out together end up together, without interleaving from other I/O operations. [write() rationale] Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. --Pavel Kankovsky aka Peak [ Boycott Microsoft--http://www.vcnet.com/bms ] "Resistance is futile. Open your source code and prepare for assimilation." ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux kernel file offset pointer races 2004-08-04 20:36 ` Linux kernel file offset pointer races Pavel Kankovsky @ 2004-08-06 12:56 ` Alan Cox 2004-08-07 12:38 ` Amon Ott 2004-08-12 21:38 ` Pavel Kankovsky 0 siblings, 2 replies; 8+ messages in thread From: Alan Cox @ 2004-08-06 12:56 UTC (permalink / raw) To: Pavel Kankovsky; +Cc: Linux Kernel Mailing List On Mer, 2004-08-04 at 21:36, Pavel Kankovsky wrote: > IMHO, the proper fix is to serialize all operations modifying a shared > file pointer (file->f_pos): read(), readv(), write(), writev(), > lseek()/llseek(). As far as I can tell, this is required by POSIX: Not if you want to get any useful work done. No Unix does this. The situation with multiple parallel lseek/read/writes is somewhat undefined anyway since you don't know if the seek or the write occurred first in user space. O_APPEND is a bit different, as are pread/pwrite but those are dealt with using locking for files. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux kernel file offset pointer races 2004-08-06 12:56 ` Alan Cox @ 2004-08-07 12:38 ` Amon Ott 2004-08-07 13:18 ` viro 2004-08-12 21:38 ` Pavel Kankovsky 1 sibling, 1 reply; 8+ messages in thread From: Amon Ott @ 2004-08-07 12:38 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 878 bytes --] On Freitag, 6. August 2004 14:56, Alan Cox wrote: > On Mer, 2004-08-04 at 21:36, Pavel Kankovsky wrote: > > IMHO, the proper fix is to serialize all operations modifying a shared > > file pointer (file->f_pos): read(), readv(), write(), writev(), > > lseek()/llseek(). As far as I can tell, this is required by POSIX: > > Not if you want to get any useful work done. No Unix does this. The > situation with multiple parallel lseek/read/writes is somewhat undefined > anyway since you don't know if the seek or the write occurred first in > user space. Would it not be useful to have per-process or per-thread offsets? Do parallel processes really need to share the offset? E.g., the struct file could (optionally) be copied on fork with copy-on-write to avoid extra memory consumption. Amon. -- http://www.rsbac.org - GnuPG: 2048g/5DEAAA30 2002-10-22 [-- Attachment #2: signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux kernel file offset pointer races 2004-08-07 12:38 ` Amon Ott @ 2004-08-07 13:18 ` viro 2004-08-07 16:02 ` Amon Ott 0 siblings, 1 reply; 8+ messages in thread From: viro @ 2004-08-07 13:18 UTC (permalink / raw) To: Amon Ott; +Cc: linux-kernel On Sat, Aug 07, 2004 at 02:38:12PM +0200, Amon Ott wrote: > Would it not be useful to have per-process or per-thread offsets? Do > parallel processes really need to share the offset? > > E.g., the struct file could (optionally) be copied on fork with > copy-on-write to avoid extra memory consumption. (cat a; cat b) > /tmp/foo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux kernel file offset pointer races 2004-08-07 13:18 ` viro @ 2004-08-07 16:02 ` Amon Ott 0 siblings, 0 replies; 8+ messages in thread From: Amon Ott @ 2004-08-07 16:02 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 930 bytes --] On Samstag, 7. August 2004 15:18, viro@parcelfarce.linux.theplanet.co.uk wrote: > On Sat, Aug 07, 2004 at 02:38:12PM +0200, Amon Ott wrote: > > Would it not be useful to have per-process or per-thread offsets? Do > > parallel processes really need to share the offset? > > > > E.g., the struct file could (optionally) be copied on fork with > > copy-on-write to avoid extra memory consumption. > > (cat a; cat b) > /tmp/foo Right, the process running (...) would not know what offset to give "cat b" as a start, because it cannot see "cat a"'s final offset. It could seek to the end of /tmp/foo before starting "cat b", but this requirement would break existing shells. Bad luck. As there are probably more examples why it does not work, there will probably have to be a macro for all offset changes, which checks for overruns. Amon. -- http://www.rsbac.org - GnuPG: 2048g/5DEAAA30 2002-10-22 [-- Attachment #2: signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux kernel file offset pointer races 2004-08-06 12:56 ` Alan Cox 2004-08-07 12:38 ` Amon Ott @ 2004-08-12 21:38 ` Pavel Kankovsky 2004-08-12 21:12 ` Alan Cox 1 sibling, 1 reply; 8+ messages in thread From: Pavel Kankovsky @ 2004-08-12 21:38 UTC (permalink / raw) To: Linux Kernel Mailing List On Fri, 6 Aug 2004, Alan Cox wrote: > On Mer, 2004-08-04 at 21:36, Pavel Kankovsky wrote: > > IMHO, the proper fix is to serialize all operations modifying a shared > > file pointer (file->f_pos): read(), readv(), write(), writev(), > > lseek()/llseek(). As far as I can tell, this is required by POSIX: > > Not if you want to get any useful work done. No Unix does this. ...serialize all operations modifying a shared file pointer wrt operations modifying the *same* pointer. > The situation with multiple parallel lseek/read/writes is somewhat > undefined anyway since you don't know if the seek or the write > occurred first in user space. buffer[0] = 0; lseek(fd, 0, SEEK_SET); write(fd, buffer, 1); lseek(fd, 0, SEEK_SET); if (fork() > 0) { buffer[0] = 1; write(fd, buffer, 1000000); } else { while (buffer[0] == 0) pread(fd, buffer, 1, 0); lseek(fd, 1234, SEEK_SET); } lseek(...1234...) cannot occur before write(...1000000) starts but it can occur before the big write() ends (unless write() is atomic). There is a similar scenario with a big read() but it is somewhat more complicated because it needs a piece of shared memory. > O_APPEND is a bit different, as are pread/pwrite but those are dealt > with using locking for files. Two write()'s are serialized by inode semaphore. But as far as can tell, there is no serialization between read()'s and write()'s. A read() overlapping a simultaneous write() might produce inconsistent results, e.g.: 1. read() starts at offset 0 2. read() reads a page of data and blocks 3. write() starts at offset 0 4. write() writes two pages of data and block 5. read() wakes up, reads two pages of data 6. write() wakes up, writes a page of data 7. read() finished 8. write() finished In this scenario, the 1st and 3rd pages read by read() contain the old data (before write()) but the 2nd page contains the new data (after write()). This is absurd. BTW: What about writev() (esp. with O_APPEND)? It appears Linux implementation makes it possible to interleave parts of writev() with other writes. Moreover, there appears to be a race condition between locks_verify_area() and the actual I/O operation(s). --Pavel Kankovsky aka Peak [ Boycott Microsoft--http://www.vcnet.com/bms ] "Resistance is futile. Open your source code and prepare for assimilation." ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux kernel file offset pointer races 2004-08-12 21:38 ` Pavel Kankovsky @ 2004-08-12 21:12 ` Alan Cox 2004-08-18 11:03 ` Pavel Kankovsky 0 siblings, 1 reply; 8+ messages in thread From: Alan Cox @ 2004-08-12 21:12 UTC (permalink / raw) To: Pavel Kankovsky; +Cc: Linux Kernel Mailing List On Iau, 2004-08-12 at 22:38, Pavel Kankovsky wrote: > In this scenario, the 1st and 3rd pages read by read() contain the old > data (before write()) but the 2nd page contains the new data (after > write()). This is absurd. Why ? > BTW: What about writev() (esp. with O_APPEND)? It appears Linux > implementation makes it possible to interleave parts of writev() with > other writes. If that can occur with O_APPEND it might be a bug. SuS does make some real guarantees about what O_APPEND means. > Moreover, there appears to be a race condition between locks_verify_area() > and the actual I/O operation(s). Details ? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Linux kernel file offset pointer races 2004-08-12 21:12 ` Alan Cox @ 2004-08-18 11:03 ` Pavel Kankovsky 0 siblings, 0 replies; 8+ messages in thread From: Pavel Kankovsky @ 2004-08-18 11:03 UTC (permalink / raw) To: Linux Kernel Mailing List On Thu, 12 Aug 2004, Alan Cox wrote: > > In this scenario, the 1st and 3rd pages read by read() contain the old > > data (before write()) but the 2nd page contains the new data (after > > write()). This is absurd. > > Why ? Ok, it is not absurd. It is insane. :) As far as I know, POSIX compliant read() and write() et al. are supposed to be serializable. > > BTW: What about writev() (esp. with O_APPEND)? It appears Linux > > implementation makes it possible to interleave parts of writev() with > > other writes. > > If that can occur with O_APPEND it might be a bug. SuS does make some > real guarantees about what O_APPEND means. The following scenario appears to be possible when the default implementation in do_readv_writev() is used: 1. process P1 calls writev(), writes the 1st chunk fs specific write operation grabs i_sem when it starts... and releases it when it finished 2. process P2 calls write() on the same file i_sem is free now, and P2 can grab it..and release it again 3. process P1 writes the 2nd chunk i_sem is free again, and P1 can grab it to write the 2nd chunk The good news is some filesystems use generic_file_writev() (ext2, ext3) or their own implementation (xfs) rather than the "supergeneric" code in do_readv_writev(). The bad news is some other filesystems (reiser) use the "supergeneric" code. > > Moreover, there appears to be a race condition between locks_verify_area() > > and the actual I/O operation(s). > > Details ? When I hold a mandatory lock on a file then no other process will be allowed to read from/write to (depending on the type of the lock) that file, right? The code (vfs_read(), vfs_write(), do_readv_writev()) in Linux appears to work as follows: 1. locks_verify_area() is called 2. file i/o is performed As far as I can tell, the process holds no locks covering both steps. Another process might call fcntl(...F_SETLK...) between steps 1 and 2, i.e. after locks_verify_area() said "ok" but before the actual i/o is finished, and the result would be one process doing i/o on a file while the other process holds a mandatory lock on the same file. --Pavel Kankovsky aka Peak [ Boycott Microsoft--http://www.vcnet.com/bms ] "Resistance is futile. Open your source code and prepare for assimilation." ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-08-18 11:03 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.44.0408041220550.26961-100000@isec.pl>
2004-08-04 20:36 ` Linux kernel file offset pointer races Pavel Kankovsky
2004-08-06 12:56 ` Alan Cox
2004-08-07 12:38 ` Amon Ott
2004-08-07 13:18 ` viro
2004-08-07 16:02 ` Amon Ott
2004-08-12 21:38 ` Pavel Kankovsky
2004-08-12 21:12 ` Alan Cox
2004-08-18 11:03 ` Pavel Kankovsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox