linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Mark Fasheh <mfasheh@suse.com>
Cc: linux-fsdevel@vger.kernel.org, Sage Weil <sage@newdream.net>,
	Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: Re: [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags
Date: Tue, 07 Apr 2009 10:52:07 -0700	[thread overview]
Message-ID: <20090407174821.GU3204@webber.adilger.int> (raw)
In-Reply-To: <20090407152959.GS14571@wotan.suse.de>

On Apr 07, 2009  08:29 -0700, Mark Fasheh wrote:
> On Tue, Apr 07, 2009 at 03:23:32AM -0700, Andreas Dilger wrote:
> > The problem with "AT_NO_*" is that old applications which couldn't possibly
> > know about or use a new stat field couldn't possibly know not to ask for
> > it. 
> 
> Well, if we add stat fields, we need new system calls _anyway_ because
> struct stat has to undergo a change. At that point, adding a new flag to the
> modified fstatat, or just changing the behavior of an existing flag in a
> backwards compatible manner should be quite easy in comparison.

Actually, there are some reserved fields in the current stat struct,
and in the past these fields have been used in a compatible manner (e.g.
nanosecond timestamps) and applications that were written before these
fields were defined will never use them so filling them in is pointless
if there is a cost in doing so.

> > Instead, as was proposed in the last generation of this thread, there
> > should be AT_GET_{ATIME,MTIME,CTIME,BLOCKS,SIZE,NLINKS,...}, to return the
> > flags that the application actually wants.  
> 
> Well, I respectfully disagree. In my opinion, this adds unncessary
> complexity to an already targetted case, where only a handful of file
> systems will even care to optimize away a couple of fields. In that light, I
> really think we want simplicity here.

The difficult part is that the application knows which fields it is going
to use, and the filesystem knows which fields are expensive to fill in,
and by merging multiple fields together you are deciding arbitrarily that
you know better than either.  The main users will be common tools like
"ls --color" and "find", and they know exactly which fields are needed,
but they need to do a LOT of stats so making it efficient is worthwhile.

> The good news is that we still have a
> very good amount of space in the flags field. We could always go back if we
> _really_ find the need to micro-optimize to the point of per stat field
> flags. Again though, at this point I don't really think starting out with a
> slew of flags is a good idea.

Similar to the helper macros for S_IRWXU = (S_IRUSR|S_IWUSR|S_IXUSR)
applications won't necessarily need to specify all flags explicitly.

> > If none of them are specified, then the current behaviour of "get all
> > attributes" is kept.
> 
> This of course, is how it works today - if none of the new flags are
> specified you get the exact old behavior. In fact, most file systems will
> wind up just happily ignoring the new flags.

Sure, but applications that DO care about these fields will want to
be able to specify the ones they want, instead of specifying the ones
they don't want (which is impossible to do for flags that the application
doesn't know about).

> > Actually, despite what was said today, Lustre doesn't revoke the writing
> > client's locks when getting the file size, unlike block-based filesystems.
> > That said, it is still relatively a lot of work to query the size for
> > a widely-striped file since there may be a hundred servers holding the
> > file data and any one of them might have the end-of-file.
> 
> Ahh ok, but the point still stands, right? It costs a significant amount of
> performance to query each file stripe for the purposes of a size calculation
> during a simple 'ls'.

Correct.  I was just clarifying that this wasn't related to lock revocation
(i.e. the node doing the writing is not affected by the ls).

> Is the behavior as implemented by AT_SIZE sufficient
> for Lustre to avoid such a performance loss?

If you mean applications not specifying AT_STAT_SIZE, then yes.

> > I think yes, since there have been security bugs filed in rare cases when
> > other bits of kernel data are exposed to user space, even if just a byte
> > or two.
> 
> Yes, good idea. I'll implement that zeroing via the client fs ->setattr
> methods in my next pass. By the way, I think that adds to my argument that
> we don't really want to micro-optimize the flags yet - now all file systems
> will have to switch on a ton of attr flags during their zeroing... Granted
> that can be turned into an exported function but hopefully you see my
> point.

Actually, it appears that the kernel is already doing memset() for
the stat struct returned to userspace, so that the filesystems don't
have to do it themselves.

> Yeah, I'll look at how he did it. I just didn't want this to return ENOSYS
> or whatnot if a file system doesn't have ->getattr_lite() but there's other
> ways around that.

Oleg agrees that changing the ->getattr() prototype is better.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  reply	other threads:[~2009-04-07 17:52 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-07  8:00 [PATCH 1/2] [RFC] vfs: 'stat light' fstatat flags Mark Fasheh
2009-04-07 10:23 ` Andreas Dilger
2009-04-07 14:59   ` Oleg Drokin
2009-04-07 15:18   ` Sage Weil
2009-04-07 15:29   ` Mark Fasheh
2009-04-07 17:52     ` Andreas Dilger [this message]
2009-04-07 18:13       ` Jamie Lokier
2009-04-07 23:13   ` Russell Cattelan
2009-04-07 17:42 ` Jamie Lokier
2009-04-07 17:48   ` Oleg Drokin
2009-04-07 18:16     ` Jamie Lokier
2009-04-07 18:24       ` Oleg Drokin
2009-04-07 20:37         ` Trond Myklebust
2009-04-08 18:48           ` Jamie Lokier
2009-04-09 15:13             ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090407174821.GU3204@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).