All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Fasheh <mark.fasheh@oracle.com>
To: linux-fsdevel@vger.kernel.org, David Chinner <dgc@sgi.com>,
	linux-ext4@vger.kernel.org, xfs@oss.sgi.com, hch@infradead.org,
	Anton Altaparmakov <aia21@cam.ac.uk>,
Subject: Re: [RFC] add FIEMAP ioctl to efficiently map file allocation
Date: Mon, 29 Oct 2007 17:11:26 -0700	[thread overview]
Message-ID: <20071030001126.GD28607@ca-server1.us.oracle.com> (raw)
In-Reply-To: <20071029221302.GD3042@webber.adilger.int>

On Mon, Oct 29, 2007 at 04:13:02PM -0600, Andreas Dilger wrote:
> On Oct 29, 2007  13:57 -0700, Mark Fasheh wrote:
> > 	Thanks for posting this. I believe that an interface such as FIEMAP
> > would be very useful to Ocfs2 as well. (I added ocfs2-devel to the e-mail)
> 
> I tried to make it as Lustre-agnostic as possible...

IMHO, your description succeeded at that. I'm hoping that the final patch
can have mostly generic code, like FIBMAP does today.


> > > #define FIEMAP_EXTENT_LAST      0x00000020 /* last extent in the file */
> > > #define FIEMAP_EXTENT_EOF       0x00000100 /* fm_start + fm_len beyond EOF*/
> > 
> > Is "EOF" here considering "beyond i_size" or "beyond allocation"?
> 
> _EOF == beyond i_size.
> _LAST == last extent in the file.
> 
> In most cases FIEMAP_EXTENT_EOF will be set at the same time as
> FIEMAP_EXTENT_LAST, but in case of e.g. prealloc beyond i_size the 
> EOF flag may be set on one or more earlier extents.

Oh, ok great - I was primarily looking for a way to say "there's allocation
past i_size" and it looks like we have it.


> > > FIEMAP_EXTENT_NO_DIRECT means data cannot be directly accessed (maybe
> > > encrypted, compressed, etc.)
> > 
> > Would it be valid to use FIEMAP_EXTENT_NO_DIRECT for marking in-inode data?
> > Btrfs, Ocfs2, and Gfs2 pack small amounts of user data directly in inode
> > blocks.
> 
> Hmm, but part of the issue would be how to request the extra data, and
> what offset it would be given?  One could, for example, use negative
> offsets to represent metadata or something, or add a FIEMAP_EXTENT_META
> or similar, I hadn't given that much thought.

Well, fe_offset and fe_length are already expressed in bytes, so we could
just put the byte offset to where the inline data starts in there. fe_length
is just used as the length allocated for inline-data.

If fe_offset is required to be block aligned, then we could add a field to
express an offset within the block where data would be found - say
'fe_data_start_offset'. In the non-inline case, we could guarantee that
fe_data_start_offset is zero. That way software which doesn't want to care
whether something is inline-data (for example, a backup program) or not
could just blidly add it to fe_offset before looking at the data.

Regardless, I think we also want to explicitely flag this:

#define FIEMAP_EXTENT_DATA_IN_INODE 0x00000400 /* extent data is stored in inode block */


I'm going to pretend that I completely understand reiserfs tail-packing and
say that my approaches above looks like they could work for that case too.
We'd want to add a seperate flag for tail packed data though.


> The other issue is that I'd like to get the basics of the API in place
> before it gets too complex. We can always add functionality with more
> FIEMAP_FLAG_* (whether in the INCOMPAT range or not, depending on what is
> being done).

Sure, but I think whatever goes upstream should be able to handle this case
- there's file systems in use _today_ which put data in inode blocks and
pack file tails.

Thanks,
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com

WARNING: multiple messages have this Message-ID (diff)
From: Mark Fasheh <mark.fasheh@oracle.com>
To: linux-fsdevel@vger.kernel.org, David Chinner <dgc@sgi.com>,
	linux-ext4@vger.kernel.org, xfs@oss.sgi.com, hch@infradead.org,
	Anton Altaparmakov <aia21@cam.ac.uk>,
	Mike Waychison <mikew@google.com>,
	ocfs2-devel@oss.oracle.com
Subject: Re: [RFC] add FIEMAP ioctl to efficiently map file allocation
Date: Mon, 29 Oct 2007 17:11:26 -0700	[thread overview]
Message-ID: <20071030001126.GD28607@ca-server1.us.oracle.com> (raw)
In-Reply-To: <20071029221302.GD3042@webber.adilger.int>

On Mon, Oct 29, 2007 at 04:13:02PM -0600, Andreas Dilger wrote:
> On Oct 29, 2007  13:57 -0700, Mark Fasheh wrote:
> > 	Thanks for posting this. I believe that an interface such as FIEMAP
> > would be very useful to Ocfs2 as well. (I added ocfs2-devel to the e-mail)
> 
> I tried to make it as Lustre-agnostic as possible...

IMHO, your description succeeded at that. I'm hoping that the final patch
can have mostly generic code, like FIBMAP does today.


> > > #define FIEMAP_EXTENT_LAST      0x00000020 /* last extent in the file */
> > > #define FIEMAP_EXTENT_EOF       0x00000100 /* fm_start + fm_len beyond EOF*/
> > 
> > Is "EOF" here considering "beyond i_size" or "beyond allocation"?
> 
> _EOF == beyond i_size.
> _LAST == last extent in the file.
> 
> In most cases FIEMAP_EXTENT_EOF will be set at the same time as
> FIEMAP_EXTENT_LAST, but in case of e.g. prealloc beyond i_size the 
> EOF flag may be set on one or more earlier extents.

Oh, ok great - I was primarily looking for a way to say "there's allocation
past i_size" and it looks like we have it.


> > > FIEMAP_EXTENT_NO_DIRECT means data cannot be directly accessed (maybe
> > > encrypted, compressed, etc.)
> > 
> > Would it be valid to use FIEMAP_EXTENT_NO_DIRECT for marking in-inode data?
> > Btrfs, Ocfs2, and Gfs2 pack small amounts of user data directly in inode
> > blocks.
> 
> Hmm, but part of the issue would be how to request the extra data, and
> what offset it would be given?  One could, for example, use negative
> offsets to represent metadata or something, or add a FIEMAP_EXTENT_META
> or similar, I hadn't given that much thought.

Well, fe_offset and fe_length are already expressed in bytes, so we could
just put the byte offset to where the inline data starts in there. fe_length
is just used as the length allocated for inline-data.

If fe_offset is required to be block aligned, then we could add a field to
express an offset within the block where data would be found - say
'fe_data_start_offset'. In the non-inline case, we could guarantee that
fe_data_start_offset is zero. That way software which doesn't want to care
whether something is inline-data (for example, a backup program) or not
could just blidly add it to fe_offset before looking at the data.

Regardless, I think we also want to explicitely flag this:

#define FIEMAP_EXTENT_DATA_IN_INODE 0x00000400 /* extent data is stored in inode block */


I'm going to pretend that I completely understand reiserfs tail-packing and
say that my approaches above looks like they could work for that case too.
We'd want to add a seperate flag for tail packed data though.


> The other issue is that I'd like to get the basics of the API in place
> before it gets too complex. We can always add functionality with more
> FIEMAP_FLAG_* (whether in the INCOMPAT range or not, depending on what is
> being done).

Sure, but I think whatever goes upstream should be able to handle this case
- there's file systems in use _today_ which put data in inode blocks and
pack file tails.

Thanks,
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com

WARNING: multiple messages have this Message-ID (diff)
From: Mark Fasheh <mark.fasheh@oracle.com>
To: linux-fsdevel@vger.kernel.org, David Chinner <dgc@sgi.com>,
	linux-ext4@vger.kernel.org, xfs@oss.sgi.com, hch@infradead.org,
	Anton Altaparmakov <aia21@cam.ac.uk>
Subject: [Ocfs2-devel] Re: [RFC] add FIEMAP ioctl to efficiently map file allocation
Date: Mon Oct 29 17:11:31 2007	[thread overview]
Message-ID: <20071030001126.GD28607@ca-server1.us.oracle.com> (raw)
In-Reply-To: <20071029221302.GD3042@webber.adilger.int>

On Mon, Oct 29, 2007 at 04:13:02PM -0600, Andreas Dilger wrote:
> On Oct 29, 2007  13:57 -0700, Mark Fasheh wrote:
> > 	Thanks for posting this. I believe that an interface such as FIEMAP
> > would be very useful to Ocfs2 as well. (I added ocfs2-devel to the e-mail)
> 
> I tried to make it as Lustre-agnostic as possible...

IMHO, your description succeeded at that. I'm hoping that the final patch
can have mostly generic code, like FIBMAP does today.


> > > #define FIEMAP_EXTENT_LAST      0x00000020 /* last extent in the file */
> > > #define FIEMAP_EXTENT_EOF       0x00000100 /* fm_start + fm_len beyond EOF*/
> > 
> > Is "EOF" here considering "beyond i_size" or "beyond allocation"?
> 
> _EOF == beyond i_size.
> _LAST == last extent in the file.
> 
> In most cases FIEMAP_EXTENT_EOF will be set at the same time as
> FIEMAP_EXTENT_LAST, but in case of e.g. prealloc beyond i_size the 
> EOF flag may be set on one or more earlier extents.

Oh, ok great - I was primarily looking for a way to say "there's allocation
past i_size" and it looks like we have it.


> > > FIEMAP_EXTENT_NO_DIRECT means data cannot be directly accessed (maybe
> > > encrypted, compressed, etc.)
> > 
> > Would it be valid to use FIEMAP_EXTENT_NO_DIRECT for marking in-inode data?
> > Btrfs, Ocfs2, and Gfs2 pack small amounts of user data directly in inode
> > blocks.
> 
> Hmm, but part of the issue would be how to request the extra data, and
> what offset it would be given?  One could, for example, use negative
> offsets to represent metadata or something, or add a FIEMAP_EXTENT_META
> or similar, I hadn't given that much thought.

Well, fe_offset and fe_length are already expressed in bytes, so we could
just put the byte offset to where the inline data starts in there. fe_length
is just used as the length allocated for inline-data.

If fe_offset is required to be block aligned, then we could add a field to
express an offset within the block where data would be found - say
'fe_data_start_offset'. In the non-inline case, we could guarantee that
fe_data_start_offset is zero. That way software which doesn't want to care
whether something is inline-data (for example, a backup program) or not
could just blidly add it to fe_offset before looking at the data.

Regardless, I think we also want to explicitely flag this:

#define FIEMAP_EXTENT_DATA_IN_INODE 0x00000400 /* extent data is stored in inode block */


I'm going to pretend that I completely understand reiserfs tail-packing and
say that my approaches above looks like they could work for that case too.
We'd want to add a seperate flag for tail packed data though.


> The other issue is that I'd like to get the basics of the API in place
> before it gets too complex. We can always add functionality with more
> FIEMAP_FLAG_* (whether in the INCOMPAT range or not, depending on what is
> being done).

Sure, but I think whatever goes upstream should be able to handle this case
- there's file systems in use _today_ which put data in inode blocks and
pack file tails.

Thanks,
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com

  reply	other threads:[~2007-10-30  0:11 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-12 11:05 [RFC] add FIEMAP ioctl to efficiently map file allocation Andreas Dilger
2007-04-12 11:22 ` Anton Altaparmakov
2007-04-13  4:01   ` Andreas Dilger
2007-04-13  4:01     ` Andreas Dilger
2007-04-13  7:46     ` Anton Altaparmakov
2007-04-13 14:53     ` Jeff Mahoney
2007-04-13  1:33 ` Nicholas Miell
2007-04-13 10:15 ` Christoph Hellwig
2007-04-13 11:38   ` Anton Altaparmakov
2007-04-13 18:55     ` Nicholas Miell
2007-04-16  8:01 ` Timothy Shimmin
2007-04-18 23:03   ` Andreas Dilger
2007-04-16 11:22 ` David Chinner
2007-04-19  0:21   ` Andreas Dilger
2007-04-19  1:54     ` David Chinner
2007-04-30 22:44       ` Andreas Dilger
2007-05-01  4:22         ` David Chinner
2007-05-01  4:39           ` Nicholas Miell
2007-05-01 14:20             ` David Chinner
2007-05-01 18:46               ` Anton Altaparmakov
2007-05-02  9:15                 ` David Chinner
2007-05-02  9:36                   ` Anton Altaparmakov
2007-05-02 10:57                     ` David Chinner
2007-05-02 11:17                       ` Anton Altaparmakov
2007-05-03  7:49                       ` Andreas Dilger
2007-05-03  8:23                         ` Anton Altaparmakov
2007-05-02  9:45                   ` Anton Altaparmakov
2007-05-01 22:32               ` Andreas Dilger
2007-05-01 18:37           ` Anton Altaparmakov
2007-05-02  0:06             ` David Chinner
2007-05-02  8:16               ` Anton Altaparmakov
2007-10-29 19:45                 ` Andreas Dilger
2007-10-29 13:57                   ` [Ocfs2-devel] " Mark Fasheh
2007-10-29 20:57                     ` Mark Fasheh
2007-10-29 20:57                     ` Mark Fasheh
2007-10-29 22:13                     ` Andreas Dilger
2007-11-05 17:44                       ` [Ocfs2-devel] " Andreas Dilger
2007-10-29 17:11                       ` Mark Fasheh [this message]
2007-10-30  0:11                         ` Mark Fasheh
2007-10-30  0:11                         ` Mark Fasheh
2007-10-30  0:25                         ` Andreas Dilger
2007-11-05 17:44                           ` [Ocfs2-devel] " Andreas Dilger
2007-10-29 22:29                       ` Andreas Dilger
2007-11-05 17:44                         ` [Ocfs2-devel] " Andreas Dilger
2007-10-29 22:29                         ` Andreas Dilger
2007-10-29 15:40                         ` [Ocfs2-devel] " Mark Fasheh
2007-10-29 22:40                           ` Mark Fasheh
2007-10-29 22:40                           ` Mark Fasheh
2007-10-29 22:25                   ` David Chinner
2007-10-29 22:25                     ` David Chinner
2007-05-01 22:30           ` Andreas Dilger
2007-05-02  2:26             ` David Chinner
2007-05-02  8:23             ` Anton Altaparmakov
2007-05-02  8:30               ` Anton Altaparmakov
2007-05-02  9:48               ` David Chinner
2007-05-02  9:56                 ` Anton Altaparmakov
2007-04-19  6:23     ` Timothy Shimmin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071030001126.GD28607@ca-server1.us.oracle.com \
    --to=mark.fasheh@oracle.com \
    --cc=aia21@cam.ac.uk \
    --cc=dgc@sgi.com \
    --cc=hch@infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.