public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@zip.com.au>
To: Jeff Garzik <jgarzik@mandrakesoft.com>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: fadvise syscall?
Date: Sun, 17 Mar 2002 23:55:01 -0800	[thread overview]
Message-ID: <3C959D55.14768770@zip.com.au> (raw)
In-Reply-To: <3C945635.4050101@mandrakesoft.com> <3C945A5A.9673053F@zip.com.au> <5.1.0.14.2.20020317131910.0522b490@pop.cus.cam.ac.uk> <3C959716.6040308@mandrakesoft.com>

Jeff Garzik wrote:
> 
> * fadvise(2) usefulness extends past open(2).  It may be useful to call
> it at various points during runtime.
> 
> * I think putting hints in open(2) is the wrong direction to go.  Hints
> have a potential to be very flexible.  open(2) O_xxx bits are not to be
> squandered lightly, while I see a lot more value in being a little more
> loose and free with the bit assignment for an "fadvise mask" (just a
> list of hint bits).  IMO it should be easier to introduce and retire
> hints, far easier than O_xxx flags.
> 

Yup.

posix_fadvise() looks to be a fine interface:

int posix_fadvise(int fd, off_t offset, size_t len, int advice);

 DESCRIPTION

     The posix_fadvise() function shall advise the implementation on
     the expected behavior of the application with respect to the data in
     the file associated with the open file descriptor, fd, starting at offset
     and continuing for len bytes. The specified range need not currently
     exist in the file. If len is zero, all data following offset is specified.
     The implementation may use this information to optimize handling
     of the specified data. The posix_fadvise() function shall have no
     effect on the semantics of other operations on the specified data,
     although it may affect the performance of other operations.

     The advice to be applied to the data is specified by the advice
     parameter and may be one of the following values:

     POSIX_FADV_NORMAL 

          Specifies that the application has no advice to give on its
          behavior with respect to the specified data. It is the default
          characteristic if no advice is given for an open file. 
     POSIX_FADV_SEQUENTIAL 

          Specifies that the application expects to access the specified
          data sequentially from lower offsets to higher offsets. 
     POSIX_FADV_RANDOM 

          Specifies that the application expects to access the specified
          data in a random order. 
     POSIX_FADV_WILLNEED 

          Specifies that the application expects to access the specified
          data in the near future. 
     POSIX_FADV_DONTNEED 

          Specifies that the application expects that it will not access
          the specified data in the near future. 
     POSIX_FADV_NOREUSE 

          Specifies that the application expects to access the specified
          data once and then not reuse it thereafter. 

We can usefully implement all of these.  FADV_WILLNEED obsoletes
sys_readahead().

We'll need to cheat a bit on the offset/len thing for NORMAL and
SEQUENTIAL - just apply it to the whole file - we don't want to have to
attach an arbitrary number of silly range objects to each file for this.
(We already cheat a bit this way with msync).

Note that it applies to a file descriptor.  If posix_fadvise(FADV_DONTNEED) is
called against a file descriptor, and someone else has an fd open
against the same file, that other user gets their foot shot off.  That's
OK.

Given this, I don't see a persuasive need to implement a non-standard
interface.  It takes an off_t, so posix_fadvise64() is also needed.

The presence of this interface doesn't imply that we don't need
good dropbehind heuristics for streaming reads and writes.  We
do need those.

I wouldn't suggest that anyone rush out and implement this stuff for 2.5.
There's some decrudding needed in filemap.c first, and many of these
hints need to interact with the 2.6 VM.  Whatever that will be.

A 2.4 implementation could be done any time.  If anyone decides to
do this, please let me know...

-

  reply	other threads:[~2002-03-18  7:57 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-03-17  8:39 fadvise syscall? Jeff Garzik
2002-03-17  8:56 ` Andrew Morton
2002-03-17  9:10   ` Jeff Garzik
2002-03-17 20:18     ` Richard Gooch
2002-03-17 13:41   ` Anton Altaparmakov
2002-03-17 14:31     ` Simon Richter
2002-03-17 14:56       ` Jan Hudec
2002-03-17 15:00     ` Anton Altaparmakov
2002-03-17 19:20     ` Joel Becker
2002-03-17 23:59     ` Anton Altaparmakov
2002-03-18  7:28     ` Jeff Garzik
2002-03-18  7:55       ` Andrew Morton [this message]
2002-03-18  8:07         ` Jeff Garzik
2002-03-18  8:17           ` Andrew Morton
2002-03-18 16:41         ` Richard Gooch
2002-03-18 19:00           ` Andrew Morton
2002-03-18 19:15             ` Richard Gooch
2002-03-22 16:05       ` Pavel Machek
2002-03-24  6:38         ` Stevie O
2002-03-24 11:24           ` Pavel Machek
2002-03-24 12:52             ` Anton Altaparmakov
2002-03-25 11:12               ` Pavel Machek
2002-03-18  8:05     ` Joel Becker
2002-03-18  8:10       ` Jeff Garzik
2002-03-18  8:20         ` Joel Becker
2002-03-18  8:14       ` Andrew Morton
2002-03-18 14:39         ` Martin K. Petersen
2002-03-18 19:15           ` Andrew Morton
2002-03-18 19:42             ` Martin K. Petersen
2002-03-19 20:08               ` Eric W. Biederman
2002-03-19 23:38                 ` Martin K. Petersen
2002-03-17 15:13 ` Ken Hirsch
2002-03-17 17:14 ` Anton Altaparmakov
2002-03-17 18:31   ` Mark Mielke
2002-03-17 18:35   ` Ken Hirsch
2002-03-17 19:06   ` Anton Altaparmakov
2002-03-17 20:19     ` Ken Hirsch
2002-03-18  0:12     ` Anton Altaparmakov
     [not found]       ` <a73ujs$5mc$1@cesium.transmeta.com>
2002-03-18  8:58         ` Jan Hudec
2002-03-18 10:08           ` Jeff Garzik
2002-03-18 17:29             ` Mark Mielke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3C959D55.14768770@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=aia21@cam.ac.uk \
    --cc=jgarzik@mandrakesoft.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox