linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Joel Becker <Joel.Becker@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 1/5] vfs: vfs-level fiemap interface
Date: Thu, 29 May 2008 17:46:48 -0600	[thread overview]
Message-ID: <20080529234648.GN2985@webber.adilger.int> (raw)
In-Reply-To: <20080528172434.GB6031@mail.oracle.com>

On May 28, 2008  10:24 -0700, Joel Becker wrote:
> On Wed, May 28, 2008 at 10:09:52AM -0600, Andreas Dilger wrote:
> > But the problem is that people are error prone in their updating of code,
> > and if the filesystems assume "the VFS has checked all of the flags except
> > this one I don't understand" will likely become incorrect over time as
> > someone will forget, will misunderstand whether the different per-fs codes
> > need to be updated, or some patch will be delayed in a FS maintainer queue
> > while the VFS "acceptance" of the new feature will be included upstream.
> 
> 	This is a specious argument - if it doesn't go upstream, we
> then have the overloaded-flag problem.

I was actually thinking of the opposite case - the VFS part of the new
flag is included upstream (i.e. ioctl_fiemap() allows the new flag),
but the filesystem-specific part is delayed by some maintainer (or lack
thereof).

We've had an ongoing issue with ext4 because we need EXPORT_SYMBOL(zero_page),
but this is not making it through the m68k maintainer yet the ext4 part of
the patch is already upstream and Andrew complains about it regularly.

> If you're looking for vendor flags, let's just design a space for them.

By no means am I looking for "private" flags or adding support for flags
that don't exist upstream (assuming it is reasonable to get new flags
upstream).  What I'm specifically concerned about is being able to support
new features that are properly accepted upstream in Lustre built against
older vendor kernels.  We are trying to get out of the kernel-patching
days because customers aren't willing to void their kernel or 3rd-party
application support by running a patched kernel on the client.

Since this is a relatively new API, I think several features like
FIEMAP_FLAG_XATTR, FIEMAP_FLAG_METADATA, and maybe a few others will be
added in the next several months, and some vendor will grab one of the
"has FIEMAP, but not all of the flags" kernels and we won't be able to
add newer features on that kernel for possibly several years.

> > The issue is that most users don't have the latest upstream kernel
> > because they are using a vendor kernel that is a few years old, as you
> > likely know, but an updated Lustre or OCFS2 or btrfs should work with
> > the existing vendor kernels.
> > 
> > If we wanted to add something like FIEMAP_FLAG_METADATA, if the check
> > was done in the VFS, it would be impossible without patching the client
> > even if it exactly matched the upstream kernel implementation.
> 
> 	First, getting vendor kernels to update a supported flag set
> that is already in mainline is pretty easy.  They are rightly interested
> in following a well-defined interface, which is what Mark's trying to do
> - no filesystems supporting flags that aren't part of the well-defined
> interface.

Reasonably so, yes.  The issue is that everyone is busy, and what may
be a priority for us isn't necessarily for the vendor, and there is
another hurdle trying to get the customer to upgrade the kernels on
their 10000-node cluster to add some bits to the compatibility flags.
Being able to add in e.g. FIEMAP_FLAG_XATTR ourselves is easier.

> 	But if you are really worried about no kernel updates when you
> install a new fs module, you can still solve it with a generic check.
> Just add /proc/sys/fs/fiemap-flag-mask.  This covers any new flags for
> the generic VFS check.  Alternately, allow filesystems to register their
> flags and then do the VFS check based on that.

If you are suggesting that the filesystems all export their "supported
flags" mask somewhere, and the VFS uses that for a check, then yes I agree
it would be possible to do.  I don't see a huge benefit of that over just
letting the filesystems do it directly themselves at that point.  Adding a
/proc or /sys or /debugfs tunable for this seems heavyweight, and needs
a sysctl or other setting on each boot - a pain for diskless clients.

It seems backward to me to add arbitrary limits to the API when it was
designed in the first place to be flexible and allow features to be
added easily.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  reply	other threads:[~2008-05-29 23:46 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-25  0:01 [PATCH 1/5] vfs: vfs-level fiemap interface Mark Fasheh
2008-05-25  7:28 ` Andreas Dilger
2008-05-27 18:31   ` Mark Fasheh
2008-05-28 16:09     ` Andreas Dilger
2008-05-28 17:24       ` Joel Becker
2008-05-29 23:46         ` Andreas Dilger [this message]
2008-05-30  0:15           ` Mark Fasheh
2008-05-30 17:24             ` Andreas Dilger
2008-05-28 19:42 ` Andreas Dilger
2008-05-28 19:54   ` Josef Bacik
2008-05-28 20:12     ` Mark Fasheh
2008-05-28 20:19       ` Josef Bacik
2008-05-28 21:23   ` Mark Fasheh
2008-05-29  1:24   ` Dave Chinner
2008-05-29 13:04     ` Christoph Hellwig
2008-05-29 17:02       ` Andreas Dilger
2008-05-31  8:16         ` Christoph Hellwig
2008-05-29 13:03   ` Christoph Hellwig
2008-06-05  5:18 ` Andreas Dilger
2008-06-05 21:35   ` jim owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080529234648.GN2985@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=Joel.Becker@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).