Re: [RFC] fadvise: add more flags to provide a hint for block allocation

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Andreas Dilger <aedilger@gmail.com>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [RFC] fadvise: add more flags to provide a hint for block allocation
Date: Wed, 7 Mar 2012 23:11:38 +1100	[thread overview]
Message-ID: <20120307121138.GK3592@dastard> (raw)
In-Reply-To: <yq1k42xhulg.fsf@sermon.lab.mkp.net>

On Wed, Mar 07, 2012 at 12:02:19AM -0500, Martin K. Petersen wrote:
> >>>>> "Andreas" == Andreas Dilger <aedilger@gmail.com> writes:
> 
> Andreas> This proposal definitely needs to have some clear explanation
> Andreas> of how the flags are intended to be used by applications, and
> Andreas> why they will help filesystems to improve allocation. 
> 
> This goes a bit deeper than just filesystem block allocation strategy.
> 
> With SMR drives lurking on the horizon it is becoming increasingly
> important for us to classify anticipated future access patterns as we
> send I/Os out to storage. We'll need something much smarter than just
> REQ_META for these devices. Tiered storage arrays and tiered flash also
> benefit from this information.

>From what I've seen of the proposed SMR device standards, we're
going to have to redesign filesystem allocation policies completely
to use anything other than a single emulated random read/write
region in a SMR drive. Filesystems are going to need to know about
the different regions and their attributes to determine how they can
allocate space and what type of write IO that can be directed to
such areas. e.g. a filesystem that overwrites metadata in place must
use a random RW region for all it's metadata - there is no other
choice. And for regions that are append only, they cannot have their
space reused until the entire region has had all active data moved
out of it first.

>From that perspective, I don't see fadvise as the best interface for
this - per-file access pattern/allocation policy information needs
to be kept persistent in the filesystem. Indeed, there is no end of
different allocation policies a filesystem could define, so I don't
think that iterating them in fadvise() is a good thing to do. I'm
not sure that fallocate() is even the right place for this, though
it is a much better match for such extensions because it is for
persistent changes to file allocation ranges.

> There's lots of work going on in the standards space in this department
> right now and I was hoping we could spend some time discussing the
> current proposals in one of the plenary sessions at LSF. Ideally we'd
> tie fadvise() and any filesystem internal knowledge into appropriate
> storage hints at the bottom of the stack.

I didn't see much in way of scope for hints at the bottom of the
stack for SMR devices - once the filesystem has allocated space in
the region for the given access type, there is no additional
information that needs to be supplied by the storage stack. I
suspect the same is true for tiered storage....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2012-03-07 12:11 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-05 12:50 [RFC] fadvise: add more flags to provide a hint for block allocation Zheng Liu
2012-03-05 19:48 ` Sunil Mushran
2012-03-06  2:35   ` Zheng Liu
2012-03-06  4:26     ` Sunil Mushran
2012-03-06 13:30       ` Zheng Liu
2012-03-06  8:27 ` Lukas Czerner
2012-03-06 13:56   ` Zheng Liu
2012-03-06 14:29     ` Lukas Czerner
2012-03-06 17:53       ` Sunil Mushran
2012-03-07  8:51         ` Lukas Czerner
2012-03-07 17:11           ` Ted Ts'o
2012-03-07  0:51 ` Dave Chinner
2012-03-07  4:14   ` Andreas Dilger
2012-03-07  5:02     ` Martin K. Petersen
2012-03-07 12:11       ` Dave Chinner [this message]
2012-03-08  4:23         ` Martin K. Petersen
2012-03-08  7:07           ` Dave Chinner
2012-03-08 17:01             ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120307121138.GK3592@dastard \
    --to=david@fromorbit.com \
    --cc=aedilger@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).