linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Mark Fasheh <mfasheh@suse.com>
Cc: linux-fsdevel@vger.kernel.org, Josef Bacik <jbacik@redhat.com>
Subject: Re: [PATCH 1/5] vfs: vfs-level fiemap interface
Date: Wed, 28 May 2008 13:42:15 -0600	[thread overview]
Message-ID: <20080528194215.GI7263@webber.adilger.int> (raw)
In-Reply-To: <20080525000157.GK8325@wotan.suse.de>

On May 24, 2008  17:01 -0700, Mark Fasheh wrote:
> Basic vfs-level fiemap infrastructure, which sets up a new ->fiemap
> inode operation.

Mark, I was looking at a way to remove the special-casing of NUM_EXTENTS
from ioctl_fiemap() in an effort to remove Christoph's objection to
keeping these in the same ioctl.

I think it is possible and reasonable to move the special-case handling
into fiemap_fill_next_extent().

> +static int ioctl_fiemap(struct file *filp, unsigned long arg)
> +{
> +
> +	if (!(fiemap.fm_flags & FIEMAP_FLAG_NUM_EXTENTS) &&
> +	    (fiemap.fm_extent_count == 0 ||
> +	     fiemap.fm_extent_count > FIEMAP_MAX_EXTENTS))
> +		return -EINVAL;

This can be changed to only check:

	if (fm_extent_count > FIEMAP_MAX_EXTENTS)
		return -EINVAL;

> +	fieinfo.fi_flags = fiemap.fm_flags;
> +	if (!(fiemap.fm_flags & FIEMAP_FLAG_NUM_EXTENTS)) {
> +		fieinfo.fi_extents_max = fiemap.fm_extent_count;
> +		fieinfo.fi_extents_start = (char *)arg + sizeof(fiemap);
> +
> +		if (!access_ok(VERIFY_WRITE, fieinfo.fi_extents_start,
> +			       fieinfo.fi_extents_max * sizeof(struct fiemap_extent)))
> +			return -EFAULT;
> +	}

It is harmless to set fi_extents_max and fi_extents_start, as this is
ignored by NUM_EXTENTS.  The fiemap_fill_next_extent() will already
check in copy_to_user() whether the fi_extents_start pointer is valid,
and fiemap_fill_next_extent() doesn't even get far enough to look at
fi_extents_max or fi_extents_start.  We just do:

	fieinfo.fi_extents = fiemap.fm_extent_count;
	fieinfo.fi_extents_start = (struct fiemap_extent *)((char *)arg +
							    sizeof(fiemap));
	
This leaves us with no checks for FIEMAP_FLAG_NUM_EXTENTS in ioctl_fiemap()
at all, and no changes needed in fiemap_fill_next_extent().

> > What about the idea to have fiemap_fill_next_extent() do "extent" merging
> > for filesystems that use the generic helper but do not return multiple
> > blocks via get_blocks()?  I don't think that is too hard to implement,
> > and makes the output more useful, otherwise we get an extent per block.  
> > The above is what I _think_ will work, haven't actually tried it out.
> 
> I don't think we want to automatically merge extents within this helper
> function. Otherwise we would diverge from the actual disk layout for extent
> based file systems where an extent might be broken up between two records
> for some other reason, such as maximum extent length being exceeded.

Do we really want to expose the filesystem-specific extent-length limits
to userspace?  In some sense, a block-based filesystem has a maximum
extent length of the blocksize, but it seems totally reasonable to merge
contiguous blocks into a single "extent" for return to userspace.  I
don't see this significantly different for ext4, even though it can have
extents up to 128MB, or unwritten extents up to 64MB.

> Btw, how many block-based file systems that don't return multiple blocks via
> get_blocks() are there that we actually care about enough to write this
> code?

That I have no clue about.  Joseph?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  parent reply	other threads:[~2008-05-28 19:42 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-25  0:01 [PATCH 1/5] vfs: vfs-level fiemap interface Mark Fasheh
2008-05-25  7:28 ` Andreas Dilger
2008-05-27 18:31   ` Mark Fasheh
2008-05-28 16:09     ` Andreas Dilger
2008-05-28 17:24       ` Joel Becker
2008-05-29 23:46         ` Andreas Dilger
2008-05-30  0:15           ` Mark Fasheh
2008-05-30 17:24             ` Andreas Dilger
2008-05-28 19:42 ` Andreas Dilger [this message]
2008-05-28 19:54   ` Josef Bacik
2008-05-28 20:12     ` Mark Fasheh
2008-05-28 20:19       ` Josef Bacik
2008-05-28 21:23   ` Mark Fasheh
2008-05-29  1:24   ` Dave Chinner
2008-05-29 13:04     ` Christoph Hellwig
2008-05-29 17:02       ` Andreas Dilger
2008-05-31  8:16         ` Christoph Hellwig
2008-05-29 13:03   ` Christoph Hellwig
2008-06-05  5:18 ` Andreas Dilger
2008-06-05 21:35   ` jim owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080528194215.GI7263@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=jbacik@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).