From: Andrew Morton <akpm@zip.com.au>
To: Jeff Garzik <jgarzik@mandrakesoft.com>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: fadvise syscall?
Date: Sun, 17 Mar 2002 23:55:01 -0800 [thread overview]
Message-ID: <3C959D55.14768770@zip.com.au> (raw)
In-Reply-To: <3C945635.4050101@mandrakesoft.com> <3C945A5A.9673053F@zip.com.au> <5.1.0.14.2.20020317131910.0522b490@pop.cus.cam.ac.uk> <3C959716.6040308@mandrakesoft.com>
Jeff Garzik wrote:
>
> * fadvise(2) usefulness extends past open(2). It may be useful to call
> it at various points during runtime.
>
> * I think putting hints in open(2) is the wrong direction to go. Hints
> have a potential to be very flexible. open(2) O_xxx bits are not to be
> squandered lightly, while I see a lot more value in being a little more
> loose and free with the bit assignment for an "fadvise mask" (just a
> list of hint bits). IMO it should be easier to introduce and retire
> hints, far easier than O_xxx flags.
>
Yup.
posix_fadvise() looks to be a fine interface:
int posix_fadvise(int fd, off_t offset, size_t len, int advice);
DESCRIPTION
The posix_fadvise() function shall advise the implementation on
the expected behavior of the application with respect to the data in
the file associated with the open file descriptor, fd, starting at offset
and continuing for len bytes. The specified range need not currently
exist in the file. If len is zero, all data following offset is specified.
The implementation may use this information to optimize handling
of the specified data. The posix_fadvise() function shall have no
effect on the semantics of other operations on the specified data,
although it may affect the performance of other operations.
The advice to be applied to the data is specified by the advice
parameter and may be one of the following values:
POSIX_FADV_NORMAL
Specifies that the application has no advice to give on its
behavior with respect to the specified data. It is the default
characteristic if no advice is given for an open file.
POSIX_FADV_SEQUENTIAL
Specifies that the application expects to access the specified
data sequentially from lower offsets to higher offsets.
POSIX_FADV_RANDOM
Specifies that the application expects to access the specified
data in a random order.
POSIX_FADV_WILLNEED
Specifies that the application expects to access the specified
data in the near future.
POSIX_FADV_DONTNEED
Specifies that the application expects that it will not access
the specified data in the near future.
POSIX_FADV_NOREUSE
Specifies that the application expects to access the specified
data once and then not reuse it thereafter.
We can usefully implement all of these. FADV_WILLNEED obsoletes
sys_readahead().
We'll need to cheat a bit on the offset/len thing for NORMAL and
SEQUENTIAL - just apply it to the whole file - we don't want to have to
attach an arbitrary number of silly range objects to each file for this.
(We already cheat a bit this way with msync).
Note that it applies to a file descriptor. If posix_fadvise(FADV_DONTNEED) is
called against a file descriptor, and someone else has an fd open
against the same file, that other user gets their foot shot off. That's
OK.
Given this, I don't see a persuasive need to implement a non-standard
interface. It takes an off_t, so posix_fadvise64() is also needed.
The presence of this interface doesn't imply that we don't need
good dropbehind heuristics for streaming reads and writes. We
do need those.
I wouldn't suggest that anyone rush out and implement this stuff for 2.5.
There's some decrudding needed in filemap.c first, and many of these
hints need to interact with the 2.6 VM. Whatever that will be.
A 2.4 implementation could be done any time. If anyone decides to
do this, please let me know...
-
next prev parent reply other threads:[~2002-03-18 7:57 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-03-17 8:39 fadvise syscall? Jeff Garzik
2002-03-17 8:56 ` Andrew Morton
2002-03-17 9:10 ` Jeff Garzik
2002-03-17 20:18 ` Richard Gooch
2002-03-17 13:41 ` Anton Altaparmakov
2002-03-17 14:31 ` Simon Richter
2002-03-17 14:56 ` Jan Hudec
2002-03-17 15:00 ` Anton Altaparmakov
2002-03-17 19:20 ` Joel Becker
2002-03-17 23:59 ` Anton Altaparmakov
2002-03-18 7:28 ` Jeff Garzik
2002-03-18 7:55 ` Andrew Morton [this message]
2002-03-18 8:07 ` Jeff Garzik
2002-03-18 8:17 ` Andrew Morton
2002-03-18 16:41 ` Richard Gooch
2002-03-18 19:00 ` Andrew Morton
2002-03-18 19:15 ` Richard Gooch
2002-03-22 16:05 ` Pavel Machek
2002-03-24 6:38 ` Stevie O
2002-03-24 11:24 ` Pavel Machek
2002-03-24 12:52 ` Anton Altaparmakov
2002-03-25 11:12 ` Pavel Machek
2002-03-18 8:05 ` Joel Becker
2002-03-18 8:10 ` Jeff Garzik
2002-03-18 8:20 ` Joel Becker
2002-03-18 8:14 ` Andrew Morton
2002-03-18 14:39 ` Martin K. Petersen
2002-03-18 19:15 ` Andrew Morton
2002-03-18 19:42 ` Martin K. Petersen
2002-03-19 20:08 ` Eric W. Biederman
2002-03-19 23:38 ` Martin K. Petersen
2002-03-17 15:13 ` Ken Hirsch
2002-03-17 17:14 ` Anton Altaparmakov
2002-03-17 18:31 ` Mark Mielke
2002-03-17 18:35 ` Ken Hirsch
2002-03-17 19:06 ` Anton Altaparmakov
2002-03-17 20:19 ` Ken Hirsch
2002-03-18 0:12 ` Anton Altaparmakov
[not found] ` <a73ujs$5mc$1@cesium.transmeta.com>
2002-03-18 8:58 ` Jan Hudec
2002-03-18 10:08 ` Jeff Garzik
2002-03-18 17:29 ` Mark Mielke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3C959D55.14768770@zip.com.au \
--to=akpm@zip.com.au \
--cc=aia21@cam.ac.uk \
--cc=jgarzik@mandrakesoft.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox