linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Chinner <dgc@sgi.com>
To: David Chinner <dgc@sgi.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	xfs@oss.sgi.com, hch@infradead.org
Subject: Re: [RFC] add FIEMAP ioctl to efficiently map file allocation
Date: Wed, 2 May 2007 12:26:44 +1000	[thread overview]
Message-ID: <20070502022644.GO77450368@melbourne.sgi.com> (raw)
In-Reply-To: <20070501223040.GL5722@schatzie.adilger.int>

On Tue, May 01, 2007 at 03:30:40PM -0700, Andreas Dilger wrote:
> On May 01, 2007  14:22 +1000, David Chinner wrote:
> > On Mon, Apr 30, 2007 at 04:44:01PM -0600, Andreas Dilger wrote:
> > > Hmm, I'd thought "offline" would migrate to EXTENT_UNKNOWN, but I didn't
> > 
> > I disagree - why would you want to indicate the state is unknown when we know
> > very well that it is offline?
> 
> If you don't like "UNKNOWN", what about "UNMAPPED"?  I just want a
> catch-all flag that indicates "this extent contains data but there is
> nothing sensible to be returned for the extent mapping."

Yes, I like that much more. Good suggestion. ;)

> > Effectively, when your extent is offline in the HSM, it is inaccessable, and
> > you have to bring it back from tape so it becomes accessible again. i.e. some
> > action is necessary on behalf of the user to make it accessible. So I think
> > that OFFLINE is a good name for this state because it really is inaccessible.
> 
> What you are calling OFFLINE I would prefer to call UNMAPPED, since that
> can be used by applications as a catch-all for "no mapping".  There can
> be further flags that give refinements to UNMAPPED that some applications
> might care about them (e.g. HSM_RESIDENT), but many users/apps will not
> if they just want the number of fragments in a given file.

Agreed - UNMAPPED does make a lot more sense in this case.

> > > Can you propose reasonable flag names for these (I can't think of anything
> > > very good) and a clear explanation of what they mean.  I suspect it will
> > > only be XFS that uses them initially.  In mke2fs and ext4+mballoc there is
> > > the concept of stripe unit and stripe width, but as yet they are not
> > > communicated between the two very well.  I'd be much happier if this info
> > > could be queried in a standard way from the block layer instead of the
> > > user having to specify it and the filesystem having to track it.
> > 
> > My preference is definitely for a separate ioctl to grab the
> > filesystem geometry so this stuff can be calculated in userspace.
> > i.e. the way XFS does it right now (XFS_IOC_FSGEOMETRY). I won't
> > bother trying to define names until we decide which appraoch we take
> > to implement this.
> 
> Hmm, previously you wrote "This information could be easily passed up in the
> flags fields if the filesystem has geometry information".  So, I _think_
> what you are saying is that you want 4 flags to convey this start/end
> alignment information, but the exact semantics of what a "stripe unit" and
> a "stripe width" is filesystem specific?

Right.

> I definitely do NOT want to get into any issues of querying the block
> device geometry here.  I was just making a passing comment that ext4+mballoc
> can already do RAID-specific allocation alignment, but it depends on the
> admin to specify this information and it would be nice if there was some
> easy way to get this from userspace/kernel interfaces.
> 
> Having an API that can request "tell me the number of blocks from this
> offset until the next physical disk boundary" or similar would be useful
> to any allocator, and the block layer already needs to know this when
> submitting IO.

The block layer knows this once you get inside the volume manager. I
think the issue is that there is no common export interface for this
information.

> > In XFS, mkfs.xfs does the work of getting this information
> > to see in the filesystem superblock. Here's the code for getting
> > sunit/swidth from the underlying block device:
> > 
> > http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfsprogs/libdisk/
> > 
> > Not much in common there ;)
> 
> It looks like this might be just what e2fsprogs needs also.

More than likely.

> > > It does make sense to specify zero for the fm_extent_count array and a
> > > new FIEMAP_FLAG_NO_EXTENTS to return only the count of extents and not the
> > > extent data itself, for the non-verbose mode of filefrag, and for
> > > pre-allocating a buffer large enough to hold the file if that is important.
> > 
> > Rather than rely on implicit behaviour of "pass in extent count of
> > zero and a don't try to return any extents" to return the number of
> > extents on the file, why not just explicitly define this as a valid
> > input flag? i.e. FIEMAP_FLAG_GET_NUMEXTENTS
> 
> That's what I said, isn't it?  FIEMAP_FLAG_NO_EXTENTS.  I wonder if my
> clever-clever for "return no extents" and "return number of extents"
> is wasted :-/.

Too clever for an API, I think. ;)

My point is mainly that if you are going to use an API for a
specific function (e.g. query the number of extents) I think that
the API should have an obvious method for executing that specific
function. Using a command of "get no extents" to provide the query
of "how many extents in this file" is kind of obscure. When you read
the code it doesn't make a lot of sense, as opposed to seeing a
clear statement of intent from the code itself.

i.e. FIEMAP_FLAG_GET_NUMEXTENTS is self-documenting in both the API
and the code that uses it...

> > > - does XFS return an extent for the metadata parts of the file (e.g. btree)?
> > 
> > No, but we can return the extent map for the attribute fork (i.e.
> > extended attrs) if asked for (XFS_IOC_GETBMAPA).
> 
> This seems like it would be a useful addition to the interface also, having
> FIEMAP_FLAG_METADATA request the return of metadata allocations too.

Agreed. The different types of requests need to be mutually
exclusive, though - returning the map of the attribute fork mixed
with the map of the data fork is going to be confusing....

> > > - does XFS allow non-root users to call xfs_bmap on files they don't own, or
> > >   use by non-root users at all?
> > 
> > Users can run xfs_bmap on any file they have permission to
> > open(O_RDONLY).
> > 
> > >   The FIBMAP ioctl is for privileged users
> > >   only, and I wonder if FIEMAP should be the same, or at least disallow
> > >   mapping files that the user can't access especially with FLAG_SYNC and/or
> > >   FLAG_HSM_READ.
> > 
> > I see little reason for restricting FI[BE]MAP to privileged users -
> > anyone should be able to determine if files they have permission to
> > access are fragmented.
> 
> I think I agree with Anton that allowing some of the flags for non-privileged
> users seems dangerous.  I think this needs to be determined on a flag-by-flag
> basis, and -EPERM should be returned in some cases.

Agreed, but I'm yet to see any flags where I think that is necessary
yet.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

  reply	other threads:[~2007-05-02  2:26 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-12 11:05 [RFC] add FIEMAP ioctl to efficiently map file allocation Andreas Dilger
2007-04-12 11:22 ` Anton Altaparmakov
2007-04-13  4:01   ` Andreas Dilger
2007-04-13  7:46     ` Anton Altaparmakov
2007-04-13 14:53     ` Jeff Mahoney
2007-04-13  1:33 ` Nicholas Miell
2007-04-13 10:15 ` Christoph Hellwig
2007-04-13 11:38   ` Anton Altaparmakov
2007-04-13 18:55     ` Nicholas Miell
2007-04-16  8:01 ` Timothy Shimmin
2007-04-18 23:03   ` Andreas Dilger
2007-04-16 11:22 ` David Chinner
2007-04-19  0:21   ` Andreas Dilger
2007-04-19  1:54     ` David Chinner
2007-04-30 22:44       ` Andreas Dilger
2007-05-01  4:22         ` David Chinner
2007-05-01  4:39           ` Nicholas Miell
2007-05-01 14:20             ` David Chinner
2007-05-01 18:46               ` Anton Altaparmakov
2007-05-02  9:15                 ` David Chinner
2007-05-02  9:36                   ` Anton Altaparmakov
2007-05-02 10:57                     ` David Chinner
2007-05-02 11:17                       ` Anton Altaparmakov
2007-05-03  7:49                       ` Andreas Dilger
2007-05-03  8:23                         ` Anton Altaparmakov
2007-05-02  9:45                   ` Anton Altaparmakov
2007-05-01 22:32               ` Andreas Dilger
2007-05-01 18:37           ` Anton Altaparmakov
2007-05-02  0:06             ` David Chinner
2007-05-02  8:16               ` Anton Altaparmakov
2007-10-29 19:45                 ` Andreas Dilger
2007-10-29 20:57                   ` Mark Fasheh
2007-10-29 22:13                     ` Andreas Dilger
2007-10-29 22:29                       ` Andreas Dilger
2007-10-29 22:40                         ` Mark Fasheh
2007-10-30  0:11                       ` Mark Fasheh
2007-10-30  0:25                         ` Andreas Dilger
2007-10-29 22:25                   ` David Chinner
2007-05-01 22:30           ` Andreas Dilger
2007-05-02  2:26             ` David Chinner [this message]
2007-05-02  8:23             ` Anton Altaparmakov
2007-05-02  8:30               ` Anton Altaparmakov
2007-05-02  9:48               ` David Chinner
2007-05-02  9:56                 ` Anton Altaparmakov
2007-04-19  6:23     ` Timothy Shimmin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070502022644.GO77450368@melbourne.sgi.com \
    --to=dgc@sgi.com \
    --cc=hch@infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).