From: Andreas Dilger <adilger@sun.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Mark Fasheh <mfasheh@suse.com>,
linux-fsdevel@vger.kernel.org, Andreas Dilger <adilger@shaw.ca>,
Kalpak Shah <Kalpak.Shah@sun.com>,
Eric Sandeen <sandeen@redhat.com>,
Josef Bacik <jbacik@redhat.com>
Subject: Re: [RFC][PATCH 0/5] Fiemap, an extent mapping ioctl
Date: Mon, 26 May 2008 04:59:44 -0600 [thread overview]
Message-ID: <20080526105944.GN3516@webber.adilger.int> (raw)
In-Reply-To: <20080525194203.GB24328@infradead.org>
On May 25, 2008 15:42 -0400, Christoph Hellwig wrote:
> On Sat, May 24, 2008 at 05:01:48PM -0700, Mark Fasheh wrote:
> > * FIEMAP_FLAG_HSM_READ
> > If the extent is offline, retrieve it before mapping and do not flag
> > it as FIEMAP_EXTENT_SECONDARY. This flag has no effect if the file
> > system does not support HSM.
>
> Given that there's no HSM support in mainline this should not be added.
> It'll be useful once we add proper HSM support, though :)
This was added at the request of David for XFS, because the XFS bmap
ioctl defaults to reading in extents from HSM. I don't have any
attachment to it myself.
> > * FIEMAP_FLAG_LUN_ORDER
> > If the file system stripes file data, this will return contiguous
> > regions of physical allocation, sorted by LUN. Logical offsets may not
> > make sense if this flag is passed. If the file system does not support
> > multiple LUNs, this flag will be ignored.
>
> A LUN doesn't make any sense in filesystem context. That's a
> scsi-centric acronym that doesn't even make sense in a scsi-centric
> filesystem universe because a LUN can of course contain multiple
> partitions. It's also extremly ill-defined when using volume managers.
What else do you propose calling this? It isn't a LUN in the SCSI sense
of course, but there is definitely a need to be able to identify multiple
disks. Regardless of whether there is a single disk or multiple disks
involved, it is generally called a LUN. It is a better than calling it
a "disk" or a "partition".
> There's also no filesystems that actually support a single file on
> multiple device in mainline, the only filesystem that supports multiple
> data devices at all (XFS) requires each file to be on a single device.
>
> Once we have a filesystem with real multiple data device support like
> btrfs or a future XFS version we can worry about this and defined
> a different ioctl for it.
I don't see why we need a different ioctl for mapping extents on a
filesystem that support direct access to multiple disks. Having one
mechanism that returns the file mapping is much more simple for user
space applications (filefrag, cp, tar, gzip, etc) than having to use
different ioctls for different backing filesystems.
> > Each extent is described by a single fiemap_extent structure as
> > returned in fm_extents.
> >
> > struct fiemap_extent {
> > __u64 fe_logical;/* logical offset in bytes for the start of
> > * the extent */
> > __u64 fe_physical; /* physical offset in bytes for the start
> > * of the extent */
> > __u64 fe_length; /* length in bytes for the extent */
> > __u32 fe_flags; /* returned FIEMAP_EXTENT_* flags for the extent */
> > __u32 fe_lun; /* logical device number for extent (starting at 0)*/
>
> Again this lun thing is horribly ill-defined. There is no such thing
> as a logic device number in our filesystem terminology.
Propose a better name then, but the need for it will not go away. This
is needed for Lustre, btrfs, pNFS, etc. The whole point of developing
this API and getting input from all of the main filesystems was to have
a single common interface that could be used by all filesystems.
> > struct fiemap_extent_info {
> > unsigned int fi_flags; /* Flags as passed from user */
> > unsigned int fi_extents_mapped; /* Number of mapped extents */
> > unsigned int fi_extents_max; /* Size of fiemap_extent array */
> > char *fi_extents_start; /* Start of fiemap_extent array */
> > };
>
> Why is this passes a structure instead of individual arguments?
Saves on passing this around as arguments on the stack? Also, for ext4
there is an iterator function which needs a private data struct passed,
and it doesn't make sense to require duplicating all of this information
again.
> Also why isn't fi_extents_start properly typed?
I was wondering about that, I'm not sure why Mark implemented it that
way. I would have thought that it should be a struct fiemap_extent *.
I thought maybe to allow for misaligned userspace pointers, but I'm
not sure.
> > If the request has the FIEMAP_FLAG_NUM_EXTENTS flag set, then calling
> > this helper is not necessary and fi_extents_mapped can be set
> > directly.
>
> Sounds like the count number of extents request should be a separate
> ioctl and separate filesystem entry point instead of overloading FIEMAP.
I don't see that at all. The operations that the filesystem has to do
are basically the same whether it is counting extents or returning them.
All that would result from having separate ioctl and filesystem methods
would be a lot of code duplication.
The fiemap_fill_next_extents() call will handle the NUM_EXTENTS operation
internally, and the filesystem code doesn't need to special case this
at all. The only time the NUM_EXTENTS case would be handled by the
filesystem specially would be if it tracks the count of extents itself
for some reason.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
next prev parent reply other threads:[~2008-05-26 10:59 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-25 0:01 [RFC][PATCH 0/5] Fiemap, an extent mapping ioctl Mark Fasheh
2008-05-25 19:42 ` Christoph Hellwig
2008-05-25 20:59 ` Brad Boyer
2008-05-26 10:59 ` Andreas Dilger [this message]
2008-05-26 18:04 ` Brad Boyer
2008-05-27 16:45 ` Christoph Hellwig
2008-05-27 21:10 ` Mark Fasheh
2008-05-27 13:48 ` Chris Mason
2008-05-27 16:21 ` Eric Sandeen
2008-05-27 16:47 ` Christoph Hellwig
2008-05-27 20:34 ` Joel Becker
2008-05-27 16:52 ` jim owens
2008-05-27 17:19 ` Chris Mason
2008-05-28 16:09 ` Andreas Dilger
2008-05-28 16:33 ` Chris Mason
2008-05-29 22:01 ` Andreas Dilger
2008-05-30 13:37 ` Chris Mason
2008-05-29 13:01 ` Christoph Hellwig
2008-05-29 20:17 ` Andreas Dilger
2008-05-27 18:56 ` Mark Fasheh
2008-05-27 20:31 ` Joel Becker
2008-05-27 20:49 ` Mark Fasheh
2008-05-28 5:14 ` Christoph Hellwig
2008-05-28 16:02 ` Andreas Dilger
2008-05-28 17:04 ` Joel Becker
2008-05-29 0:51 ` Dave Chinner
2008-05-29 13:02 ` Christoph Hellwig
2008-05-29 15:33 ` jim owens
2008-05-29 15:53 ` Jamie Lokier
2008-05-29 18:56 ` Joel Becker
2008-05-29 21:41 ` Andreas Dilger
2008-05-29 21:47 ` Joel Becker
2008-05-29 23:20 ` Andreas Dilger
2008-05-29 1:17 ` Andreas Dilger
2008-05-29 5:55 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080526105944.GN3516@webber.adilger.int \
--to=adilger@sun.com \
--cc=Kalpak.Shah@sun.com \
--cc=adilger@shaw.ca \
--cc=hch@infradead.org \
--cc=jbacik@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mfasheh@suse.com \
--cc=sandeen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).