Re: [PATCH 1/5] vfs: vfs-level fiemap interface

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Andreas Dilger <adilger@sun.com>
To: Mark Fasheh <mfasheh@suse.com>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 1/5] vfs: vfs-level fiemap interface
Date: Sun, 25 May 2008 01:28:16 -0600	[thread overview]
Message-ID: <20080525072816.GH3516@webber.adilger.int> (raw)
In-Reply-To: <20080525000157.GK8325@wotan.suse.de>

On May 24, 2008  17:01 -0700, Mark Fasheh wrote:
> Basic vfs-level fiemap infrastructure, which sets up a new ->fiemap
> inode operation.

Mark, this appears to be missing most of my previous comments.  Are
you getting my emails, or is there still a problem?

> +++ b/Documentation/filesystems/fiemap.txt
> +A fiemap request is encoded within struct fiemap:
> +
> +struct fiemap {
> +	__u64	fm_start;	 /* logical offset (inclusive) at
> +				  * which to start mapping (in) */
> +	__u64	fm_length;	 /* logical length of mapping which
> +				  * userspace cares about (in) */
> +	__u32	fm_flags;	 /* FIEMAP_FLAG_* flags for request (in) */

This should be marked (in/out) because of error return, documented below.

> +* FIEMAP_EXTENT_UNKNOWN
> +The location of this extent is currently unknown. This may indicate
> +the data is stored on an inaccessible volume or that no storage has
> +been allocated for the file yet.
> +
> +* FIEMAP_EXTENT_SECONDARY
> +  - This will also set FIEMAP_EXTENT_UNKNOWN.
> +The data for this extent is in secondary storage.

This is not correct that it will always set FIEMAP_EXTENT_UNKNOWN here.
It is possible (common) for data to be stored on both primary AND
secondary storage at the same time.  It is still desirable to mark the
data as FIEMAP_EXTENT_SECONDARY even if it is still resident in the
filesystem, because this informs the user that the HSM/backup system
has made a copy of this file already.

> +For each extent in the request range, the file system should call
> +the helper function, fiemap_fill_next_extent():
> +
> +int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
> +			    u64 phys, u64 len, u32 flags, u32 lun);
> +
> +fiemap_fill_next_extent() will use the passed values to populate the
> +next free extent in the fm_extents array. 'General' extent flags will
> +automatically be set from specific flags on behalf of the calling file
> +system so that the userspace API is not broken.
> +
> +fiemap_fill_next_extent() returns 0 on success, and 1 when the
> +user-supplied fm_extents array is full. If an error is encountered
> +while copying the extent to user memory, -EFAULT will be returned.
> +
> +If the request has the FIEMAP_FLAG_NUM_EXTENTS flag set, then calling
> +this helper is not necessary and fi_extents_mapped can be set
> +directly.

It is worthwhile to mention here that if FIEMAP_FLAG_NUM_EXTENTS IS set,
it is also OK for the filesystem to call fiemap_fill_next_extent(), and
only the number of extents will be counted, avoiding the need to special-
case this flag in the filesystem implementation.



What about the idea to have fiemap_fill_next_extent() do "extent" merging
for filesystems that use the generic helper but do not return multiple
blocks via get_blocks()?  I don't think that is too hard to implement,
and makes the output much more useful, otherwise we get an extent per block.
The above is what I _think_ will work, haven't actually tried it out.

struct fiemap_extnent_info {
	struct fiemap_extent	fi_last;	/* Copy of previous extent */
	unsigned int	fi_flags;		/* Flags as passed from user */
	unsigned int	fi_extents_mapped;      /* Number of mapped extents */
	unsigned int	fi_extents_max;		/* fiemap_extent array size */
	char            *fi_extents_start;	/* fiemap_extent array start */
}

int fiemap_fill_next_extent(struct fiemap_extent_info *fieinfo,
			    u64 logical, u64 phys, u64 len, u32 flags, u32 lun)
{
	struct fiemap_extent *extent = &fieinfo->fi_extent_last;
	int joined = 1;

	if (flags & SET_UNKNOWN_FLAGS)
		flags |= FIEMAP_EXTENT_UNKNOWN;
	if (flags & SET_NO_DIRECT_FLAGS)
		flags |= FIEMAP_EXTENT_NO_DIRECT;
	if (flags & SET_NOT_ALIGNED_FLAGS)
		flags |= FIEMAP_EXTENT_NOT_ALIGNED;

	/* If this is an extension of the previous extent, add it in */
	if (fieinfo->fi_extents_mapped > 0 &&
	    extent->fe_logical + extent->fe_length == logical &&
	    extent->fe_physical + extent->fe_length == phys &&
	    extent->fe_flags == (flags & ~FIEMAP_FLAG_LAST) &&
	    extent->fe_lun == lun) {
	    	extent->fe_length += len;
		extent->fe_flags = flags;	/* maybe set FIEMAP_FLAG_LAST */

		if (!(fieinfo->fi_flags & FIEMAP_FLAG_NUM_EXTENTS))
			fieinfo->fe_extents_start -= sizeof(*extent);
		fieinfo->fi_extents_mapped--;

		joined = 1;
	}

	if (fieinfo->fi_flags & FIEMAP_FLAG_NUM_EXTENTS) {
		fieinfo->fi_extents_mapped++;
		return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
	}

	if (joined)
		goto copy;

	if (fieinfo->fi_extents_mapped >= fieinfo->fi_extents_max)
		return 1;

	/* Store new extent info in fi_extent_last for next call */
	extent->fe_logical = logical;
	extent->fe_physical = phys;
	extent->fe_length = len;
	extent->fe_flags = flags;
	extent->fe_lun = lun;

copy:
	if (copy_to_user(fieinfo->fi_extents_start, extent, sizeof(*extent)))
		return -EFAULT;

	fieinfo->fi_extents_start += sizeof(*extent);
	fieinfo->fi_extents_mapped++;

	/* Don't return 1 if we are using the last extent slot as we may
	 * be able to merge with the next extent, and we check above if
	 * the next extent won't fit into the array.
	 */

	return (flags & FIEMAP_EXTENT_LAST) ? 1 : 0;
}
EXPORT_SYMBOL(fiemap_fill_next_extent);

> +static int ioctl_fiemap(struct file *filp, unsigned long arg)
> +{
> +	struct fiemap fiemap;
> +	u64 len;
> +	u32 incompat_flags;
> +	struct fiemap_extent_info fieinfo = {0, };
> +	struct inode *inode = filp->f_path.dentry->d_inode;
> +	struct super_block *sb = inode->i_sb;
> +	int error = 0;
> +
> +	if (!inode->i_op->fiemap)
> +		return -EOPNOTSUPP;
> +
> +	if (copy_from_user(&fiemap, (struct fiemap __user *) arg,
> +			   sizeof(struct fiemap)))
> +		return -EFAULT;
> +
> +	/*
> +	 * The VFS does basic sanity checks on the flag
> +	 * value. Individual file systems can also reject otherwise
> +	 * 'valid' flags by returning -EBADR from ->fiemap.
> +	 */
> +	incompat_flags = fiemap.fm_flags & ~FIEMAP_FLAGS_COMPAT;
> +	if (incompat_flags)
> +		goto out_bad_flags;

Please remove this, and leave the checking for the individual filesystems.
They all need to do it anyways, because they can't depend on the check here
never changing or always being updated when this check does.  At best it is
redundant with the check in the filesystem, at worst it will lead to bugs
in the filesystem-specific code due to inconsistent checks.

> +#define FIEMAP_FLAGS_COMPAT	(FIEMAP_FLAG_NUM_EXTENTS|FIEMAP_FLAG_SYNC|   \
> +				 FIEMAP_FLAG_HSM_READ|FIEMAP_FLAG_XATTR|     \
> +				 FIEMAP_FLAG_LUN_ORDER)

This isn't needed anymore.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

next prev parent reply	other threads:[~2008-05-25  7:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-25  0:01 [PATCH 1/5] vfs: vfs-level fiemap interface Mark Fasheh
2008-05-25  7:28 ` Andreas Dilger [this message]
2008-05-27 18:31   ` Mark Fasheh
2008-05-28 16:09     ` Andreas Dilger
2008-05-28 17:24       ` Joel Becker
2008-05-29 23:46         ` Andreas Dilger
2008-05-30  0:15           ` Mark Fasheh
2008-05-30 17:24             ` Andreas Dilger
2008-05-28 19:42 ` Andreas Dilger
2008-05-28 19:54   ` Josef Bacik
2008-05-28 20:12     ` Mark Fasheh
2008-05-28 20:19       ` Josef Bacik
2008-05-28 21:23   ` Mark Fasheh
2008-05-29  1:24   ` Dave Chinner
2008-05-29 13:04     ` Christoph Hellwig
2008-05-29 17:02       ` Andreas Dilger
2008-05-31  8:16         ` Christoph Hellwig
2008-05-29 13:03   ` Christoph Hellwig
2008-06-05  5:18 ` Andreas Dilger
2008-06-05 21:35   ` jim owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080525072816.GH3516@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).