Re: Linux kernel file offset pointer races

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Pavel Kankovsky <peak@argo.troja.mff.cuni.cz>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Linux kernel file offset pointer races
Date: Thu, 12 Aug 2004 23:38:13 +0200 (MET DST)	[thread overview]
Message-ID: <20040812223057.CF9.0@argo.troja.mff.cuni.cz> (raw)
In-Reply-To: <1091796995.16306.20.camel@localhost.localdomain>

On Fri, 6 Aug 2004, Alan Cox wrote:

> On Mer, 2004-08-04 at 21:36, Pavel Kankovsky wrote:
> > IMHO, the proper fix is to serialize all operations modifying a shared
> > file pointer (file->f_pos): read(), readv(), write(), writev(),
> > lseek()/llseek(). As far as I can tell, this is required by POSIX:
> 
> Not if you want to get any useful work done. No Unix does this.

...serialize all operations modifying a shared file pointer wrt operations
modifying the *same* pointer.

> The situation with multiple parallel lseek/read/writes is somewhat
> undefined anyway since you don't know if the seek or the write
> occurred first in user space.

buffer[0] = 0;
lseek(fd, 0, SEEK_SET);
write(fd, buffer, 1);
lseek(fd, 0, SEEK_SET);

if (fork() > 0) {
  buffer[0] = 1;
  write(fd, buffer, 1000000);
}
else {
  while (buffer[0] == 0)
    pread(fd, buffer, 1, 0);
  lseek(fd, 1234, SEEK_SET);
}

lseek(...1234...) cannot occur before write(...1000000) starts but it can
occur before the big write() ends (unless write() is atomic).

There is a similar scenario with a big read() but it is somewhat more
complicated because it needs a piece of shared memory.

> O_APPEND is a bit different, as are pread/pwrite but those are dealt
> with using locking for files.

Two write()'s are serialized by inode semaphore. But as far as can tell,
there is no serialization between read()'s and write()'s. A read()
overlapping a simultaneous write() might produce inconsistent results,
e.g.:

1. read() starts at offset 0
2. read() reads a page of data and blocks
3. write() starts at offset 0
4. write() writes two pages of data and block
5. read() wakes up, reads two pages of data
6. write() wakes up, writes a page of data
7. read() finished
8. write() finished

In this scenario, the 1st and 3rd pages read by read() contain the old
data (before write()) but the 2nd page contains the new data (after
write()). This is absurd.

BTW: What about writev() (esp. with O_APPEND)? It appears Linux
implementation makes it possible to interleave parts of writev() with
other writes.

Moreover, there appears to be a race condition between locks_verify_area()
and the actual I/O operation(s).

--Pavel Kankovsky aka Peak  [ Boycott Microsoft--http://www.vcnet.com/bms ]
"Resistance is futile. Open your source code and prepare for assimilation."

next prev parent reply	other threads:[~2004-08-12 21:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.44.0408041220550.26961-100000@isec.pl>
2004-08-04 20:36 ` Linux kernel file offset pointer races Pavel Kankovsky
2004-08-06 12:56   ` Alan Cox
2004-08-07 12:38     ` Amon Ott
2004-08-07 13:18       ` viro
2004-08-07 16:02         ` Amon Ott
2004-08-12 21:38     ` Pavel Kankovsky [this message]
2004-08-12 21:12       ` Alan Cox
2004-08-18 11:03         ` Pavel Kankovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040812223057.CF9.0@argo.troja.mff.cuni.cz \
    --to=peak@argo.troja.mff.cuni.cz \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox