linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Fasheh <mfasheh@suse.com>
To: Andreas Dilger <adilger@sun.com>
Cc: linux-fsdevel@vger.kernel.org, Sage Weil <sage@newdream.net>,
	Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
Date: Tue, 7 Apr 2009 08:29:59 -0700	[thread overview]
Message-ID: <20090407152959.GS14571@wotan.suse.de> (raw)
In-Reply-To: <20090407102332.GS3204@webber.adilger.int>

On Tue, Apr 07, 2009 at 03:23:32AM -0700, Andreas Dilger wrote:
> On Apr 07, 2009  01:00 -0700, Mark Fasheh wrote:
> > This very, very rough patch set adds three flags to fstatat - AT_NO_SIZE,
> > AT_NO_TIMES, and AT_STRICT.
> 
> It seems you and Oleg have two patches crossing in the night...

Heh, it seems so. At least we went from having zero interest in implementing
this to two patches! At any rate, thanks for taking the time to review them.


> > The first two flags (AT_NO_SIZE, AT_NO_TIMES) allow userspace to notify the
> > file system layer that certain stat fields are not required to be accurate.
> 
> The problem with "AT_NO_*" is that old applications which couldn't possibly
> know about or use a new stat field couldn't possibly know not to ask for
> it. 

Well, if we add stat fields, we need new system calls _anyway_ because
struct stat has to undergo a change. At that point, adding a new flag to the
modified fstatat, or just changing the behavior of an existing flag in a
backwards compatible manner should be quite easy in comparison.


> Instead, as was proposed in the last generation of this thread, there
> should be AT_GET_{ATIME,MTIME,CTIME,BLOCKS,SIZE,NLINKS,...}, to return the
> flags that the application actually wants.  

Well, I respectfully disagree. In my opinion, this adds unncessary
complexity to an already targetted case, where only a handful of file
systems will even care to optimize away a couple of fields. In that light, I
really think we want simplicity here. The good news is that we still have a
very good amount of space in the flags field. We could always go back if we
_really_ find the need to micro-optimize to the point of per stat field
flags. Again though, at this point I don't really think starting out with a
slew of flags is a good idea.


> If none of them are specified, then the current behaviour of "get all
> attributes" is kept.

This of course, is how it works today - if none of the new flags are
specified you get the exact old behavior. In fact, most file systems will
wind up just happily ignoring the new flags.


> 
> > Some file systems want this information in order to optimize away expensive
> > operations associated with stat. In particular, NFS can avoid some syncing
> > to the server (if userspace doesn't want atime, ctime or mtime) and Lustre
> > can avoid some expensive locking by avoiding an update of various size
> > fields (st_size, st_blocks).
> 
> Actually, despite what was said today, Lustre doesn't revoke the writing
> client's locks when getting the file size, unlike block-based filesystems.
> That said, it is still relatively a lot of work to query the size for
> a widely-striped file since there may be a hundred servers holding the
> file data and any one of them might have the end-of-file.

Ahh ok, but the point still stands, right? It costs a significant amount of
performance to query each file stripe for the purposes of a size calculation
during a simple 'ls'. Is the behavior as implemented by AT_SIZE sufficient
for Lustre to avoid such a performance loss?


> > AT_STRICT allows userspace to indicate that it wants the most up to date
> > version of a files status, regardless of performance impact. A distributed
> > file system which has a non-coherent inode cache would know then to send a
> > direct query to it's server.
> 
> The issue with AT_STRICT is that applications TODAY are expecting the
> proper file status.  I'd be more inclined to have an AT_LAZY flag for
> applications that do NOT require the most up-to-date information.

The reason I put that in there is because I believe there may be some
file systems which implement the 'lazy' behavior for regular stat so this
seemed a good way to short circuit that caching for those apps that really
care.

That said, I'm not 100% sure either AT_STRICT or AT_LAZY are really required
for a first pass - it just seemed convenient to throw AT_STRICT in there.


> > As noted previously, these patches are really rough. Mostly I'd like to get
> > some feedback on the interface and general direction of implementation. Some
> > glaring issues which we want to resolve:
> > 
> > - This patch set doesn't actually wire up any client file systems yet :)
> > 
> > - There's the question of whether we wire up [fl]stat(2) variants instead of
> >   using fstatat. I went with the former as the implementation was more
> >   straight forward.
> 
> It appears that fstatat(2) fits the need for "statlite" cleanly.  I'd
> thought today that this would require opening each file, but it is
> only necessary to open the parent directory and call fstatat() for
> each filename in the directory.
> 
> > - I'm not sure whether we want to force zeroing of the optional fields, or
> >   return whatever's in the inode (which may be stale, or just junk).
> 
> I think yes, since there have been security bugs filed in rare cases when
> other bits of kernel data are exposed to user space, even if just a byte
> or two.

Yes, good idea. I'll implement that zeroing via the client fs ->setattr
methods in my next pass. By the way, I think that adds to my argument that
we don't really want to micro-optimize the flags yet - now all file systems
will have to switch on a ton of attr flags during their zeroing... Granted
that can be turned into an exported function but hopefully you see my
point.


> > -int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
> > +int vfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat,
> > +		int flags)
> >  {
> >  	if (inode->i_op->getattr)
> > -		return inode->i_op->getattr(mnt, dentry, stat);
> > +		return inode->i_op->getattr(mnt, dentry, stat, attr_flags);
> 
> Oleg's version added a second ->getattr_lite() call that passed the flags,
> which avoids changing all of the filesystems, though I guess it isn't a
> huge deal either way.

Yeah, I'll look at how he did it. I just didn't want this to return ENOSYS
or whatnot if a file system doesn't have ->getattr_lite() but there's other
ways around that.
	--Mark

--
Mark Fasheh

  parent reply	other threads:[~2009-04-07 15:30 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-07  8:00 [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags Mark Fasheh
2009-04-07 10:23 ` Andreas Dilger
2009-04-07 14:59   ` Oleg Drokin
2009-04-07 15:18   ` Sage Weil
2009-04-07 15:29   ` Mark Fasheh [this message]
2009-04-07 17:52     ` Andreas Dilger
2009-04-07 18:13       ` Jamie Lokier
2009-04-07 23:13   ` Russell Cattelan
2009-04-07 17:42 ` Jamie Lokier
2009-04-07 17:48   ` Oleg Drokin
2009-04-07 18:16     ` Jamie Lokier
2009-04-07 18:24       ` Oleg Drokin
2009-04-07 20:37         ` Trond Myklebust
2009-04-08 18:48           ` Jamie Lokier
2009-04-09 15:13             ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090407152959.GS14571@wotan.suse.de \
    --to=mfasheh@suse.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=adilger@sun.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).