From: Sunil Mushran <sunil.mushran@oracle.com>
To: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [RFC] fadvise: add more flags to provide a hint for block allocation
Date: Mon, 05 Mar 2012 11:48:43 -0800 [thread overview]
Message-ID: <4F55189B.4080507@oracle.com> (raw)
In-Reply-To: <20120305125029.GA5121@gmail.com>
On 03/05/2012 04:50 AM, Zheng Liu wrote:
> Hi list,
>
> Block allocation is a key component of file system. Every file systems try to
> improve the performance with optimizing the block allocation of a file. But no
> matter what file system does, it just guesses what the user expects. Thus, it
> is not very accurate. fadvise(2) provides a method to let the user to give a
> hint to file system. However, until now, only few flags are provided. So we
> can provide more flags to tell file system how to allocate the blocks for a
> file.
>
> For example:
> we can add these flags into fadvise(2):
> FADV_ALLOC_READ_SEQ
> FADV_ALLOC_READ_RANDOM
> FADV_ALLOC_WRITE_ONCE
> FADV_ALLOC_WRITE_APPEND
>
> FADV_ALLOC_READ_* are not similar with FADV_SEQUENTIAL and FADV_RANDOM.
> FADV_ALLOC_READ_SEQ tells file system that this file need to allocate some
> sequential blocks, and FADV_ALLOC_READ_RADOM tells file system that this file
> can endure the fragmentation.
File systems typically allocate the best layout they can for a file
at the time of write. Does _RANDOM mean do not do that. Find single
bits scattered around the disk. If so, why will people use it. I mean,
random IOs are slow. What you are proposing it is a further slowdown.
Hardly a feature that will be attractive to users.
> FADV_ALLOC_WRITE_ONCE indicates that this file just is written once. So file
> system can allocate some sequential blocks for it to improve the read
> performance. FADV_ALLOC_WRITE_APPEND flag is set to point out that data will be
> appended to the end of this file, and file system can reserve some blocks for it
> to guarantee the sequence as much as possible.
Define ONCE. Is it one write(2)? I guess not. You probably mean
that once the file descriptor is closed, it will not be written
to. But we have no way of knowing how many writes there will be.
So it will be treated the same as APPEND. And file systems already
provide allocation reservation and/or delayed allocation to handle
APPEND write loads. So this flag does not offer much to the user
or the fs.
next prev parent reply other threads:[~2012-03-05 19:50 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-05 12:50 [RFC] fadvise: add more flags to provide a hint for block allocation Zheng Liu
2012-03-05 19:48 ` Sunil Mushran [this message]
2012-03-06 2:35 ` Zheng Liu
2012-03-06 4:26 ` Sunil Mushran
2012-03-06 13:30 ` Zheng Liu
2012-03-06 8:27 ` Lukas Czerner
2012-03-06 13:56 ` Zheng Liu
2012-03-06 14:29 ` Lukas Czerner
2012-03-06 17:53 ` Sunil Mushran
2012-03-07 8:51 ` Lukas Czerner
2012-03-07 17:11 ` Ted Ts'o
2012-03-07 0:51 ` Dave Chinner
2012-03-07 4:14 ` Andreas Dilger
2012-03-07 5:02 ` Martin K. Petersen
2012-03-07 12:11 ` Dave Chinner
2012-03-08 4:23 ` Martin K. Petersen
2012-03-08 7:07 ` Dave Chinner
2012-03-08 17:01 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F55189B.4080507@oracle.com \
--to=sunil.mushran@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).