linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Chris Mason <chris.mason@oracle.com>, jim owens <jowens@hp.com>,
	linux-fsdevel@vger.kernel.org, Mark Fasheh <mfasheh@suse.com>,
	Andreas Dilger <adilger@shaw.ca>,
	Kalpak Shah <Kalpak.Shah@sun.com>,
	Eric Sandeen <sandeen@redhat.com>,
	Josef Bacik <jbacik@redhat.com>
Subject: Re: [RFC][PATCH 0/5] Fiemap, an extent mapping ioctl
Date: Thu, 29 May 2008 14:17:56 -0600	[thread overview]
Message-ID: <20080529201756.GF2985@webber.adilger.int> (raw)
In-Reply-To: <20080529130134.GA21299@infradead.org>

On May 29, 2008  09:01 -0400, Christoph Hellwig wrote:
> On Wed, May 28, 2008 at 10:09:31AM -0600, Andreas Dilger wrote:
> > ... but I don't think it should necessarily be _required_ to return a
> > real "dev_t" (major, minor) device.  For network filesystems this is
> > meaningless.  If it is possible for FIEMAP_EXTENT_NET to signal that the
> > device is not a local/physical device (where a dev_t has no meaning),
> > and simply allow an enumeration [0, 1, 2, ...] of the logical devices
> > then I think this is reasonable.  The mapping of logical devices to
> > servers is available separately with a Lustre-specific ioctl.
> > 
> > This passes more information for filesystems that have local devices
> > while not breaking the functionality for network filesystems and could
> > be used as an efficient replacement for lilo's use of FIBMAP.
> 
> A dev_t actually means something for the only in-tree users of
> this interface, so there's no point making this interface worse for
> some long-term out of tree code.  And it's not like you simply can't
> allow multiple anonymous blockdevices for your networked filesystems
> similar to the one used for st_dev already.

But requiring 1500 anonymous blockdevices (== number of storage targets)
be created at mount time, which exporting some varying-over-reboot, and
inconsistent-across-clients random-value dev_t for network filesystems
just for the possibility that the client is going to do FIEMAP isn't
making the interface better either...

Getting devices of [0x1908afed, 0x4058204b] back from FIEMAP of a file
on one client, and [0x4bac5821, 0x0abefd63] on another client is pretty
useless compared to devices [2, 4], which have very clear meanings,
will always be the same across all clients, and the same across reboots.

> > For RAID1/10 you can return multiple logical->physical extent mappings
> > for the same logical range of the file with different "device" IDs.  You
> > could do the same for RAID5 returning each of the data and parity chunks
> > with "NO_DIRECT" if desired (maybe only on the parity extent, or don't
> > return the parity extent at all).  The spec does not require that the
> > returned extents be non-overlapping.
> 
> Umm, no.  That's just make the interface too complicated.  I can bet
> with your that userspace programmers will generally only test their code
> with simple filesystems and hell will break lose when they get these
> multiple ranges.  Especially as that's a very unnatural interface.

The metadata information isn't exposed to callers by default, they have
to request it explicitly with e.g. FIEMAP_FLAG_METADATA.  For the most
common use cases, applications/users will care about:
a) for cp/tar/dd/etc they only want to know where there are holes.  This
   is available in the most simple instance of FIEMAP (no flags).
b) for "fiemap" the user will want to know whether there are large or
   small contiguous allocations/fragmentation, or just the extent count.
c) for sophisticated users (e.g. filesystem developers, performance tuning)
   they want to know both the extent information, the metadata layout, and
   possibly the mapping all the way down to the platters

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  reply	other threads:[~2008-05-29 20:18 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-25  0:01 [RFC][PATCH 0/5] Fiemap, an extent mapping ioctl Mark Fasheh
2008-05-25 19:42 ` Christoph Hellwig
2008-05-25 20:59   ` Brad Boyer
2008-05-26 10:59   ` Andreas Dilger
2008-05-26 18:04     ` Brad Boyer
2008-05-27 16:45     ` Christoph Hellwig
2008-05-27 21:10       ` Mark Fasheh
2008-05-27 13:48   ` Chris Mason
2008-05-27 16:21     ` Eric Sandeen
2008-05-27 16:47       ` Christoph Hellwig
2008-05-27 20:34         ` Joel Becker
2008-05-27 16:52     ` jim owens
2008-05-27 17:19       ` Chris Mason
2008-05-28 16:09         ` Andreas Dilger
2008-05-28 16:33           ` Chris Mason
2008-05-29 22:01             ` Andreas Dilger
2008-05-30 13:37               ` Chris Mason
2008-05-29 13:01           ` Christoph Hellwig
2008-05-29 20:17             ` Andreas Dilger [this message]
2008-05-27 18:56   ` Mark Fasheh
2008-05-27 20:31     ` Joel Becker
2008-05-27 20:49       ` Mark Fasheh
2008-05-28  5:14       ` Christoph Hellwig
2008-05-28 16:02       ` Andreas Dilger
2008-05-28 17:04         ` Joel Becker
2008-05-29  0:51           ` Dave Chinner
2008-05-29 13:02             ` Christoph Hellwig
2008-05-29 15:33               ` jim owens
2008-05-29 15:53                 ` Jamie Lokier
2008-05-29 18:56                 ` Joel Becker
2008-05-29 21:41                   ` Andreas Dilger
2008-05-29 21:47                     ` Joel Becker
2008-05-29 23:20                       ` Andreas Dilger
2008-05-29  1:17           ` Andreas Dilger
2008-05-29  5:55         ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080529201756.GF2985@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=Kalpak.Shah@sun.com \
    --cc=adilger@shaw.ca \
    --cc=chris.mason@oracle.com \
    --cc=hch@infradead.org \
    --cc=jbacik@redhat.com \
    --cc=jowens@hp.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).