linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ragnar Kjørstad" <kernel@ragnark.vestdata.no>
To: Ulrich Drepper <drepper@redhat.com>
Cc: Rob Ross <rross@mcs.anl.gov>,
	Christoph Hellwig <hch@infradead.org>,
	Trond Myklebust <trond.myklebust@fys.uio.no>,
	Sage Weil <sage@newdream.net>, Brad Boyer <flar@allandria.com>,
	Anton Altaparmakov <aia21@cam.ac.uk>,
	Gary Grider <ggrider@lanl.gov>,
	linux-fsdevel@vger.kernel.org
Subject: Re: NFSv4/pNFS possible POSIX I/O API standards
Date: Sun, 17 Dec 2006 15:41:30 +0100	[thread overview]
Message-ID: <20061217144130.GN25199@vestdata.no> (raw)
In-Reply-To: <45770850.4030003@redhat.com>

On Wed, Dec 06, 2006 at 10:13:36AM -0800, Ulrich Drepper wrote:
> Ragnar Kjørstad wrote:
> >I guess the code needs to be checked, but I would think that:
> >* ls
> >* find
> >* rm -r
> >* chown -R
> >* chmod -R
> >* rsync
> >* various backup software
> >* imap servers
> 
> Then somebody do the analysis.

This is by no means a full analysis, but maybe someone will find it
useful anyway. All performance tests are done with a directory tree with
the lkml archive in maildir format on a local ext3 filesystem. The
numbers are systemcall walltime, seen through strace. 


I think Andreas already wrote that "ls --color" is the default in most
distributions and needs to stat every file.

ls --color -R kernel_old:
82.27% 176.37s  0.325ms  543332 lstat
17.61%  37.75s  5.860ms    6442 getdents64
 0.04%   0.09s  0.018ms    4997 write
 0.03%   0.06s 55.462ms       1 execve
 0.02%   0.04s  5.255ms       8 poll


"find" is already smart enough to not call stat when it's not needed,
and make use of d_type when it's available. But in many cases stat is
still needed (such as with -user)

find kernel_old -not -user 1002:
83.63% 173.11s  0.319ms  543338 lstat
16.31%  33.77s  5.242ms    6442 getdents64
 0.03%   0.06s 62.882ms       1 execve
 0.01%   0.03s  6.904ms       4 poll
 0.01%   0.02s  8.383ms       2 connect

rm was a false alarm. It only uses stat to check for directories, and
it's already beeing smart about it, not statting directories with
n_links==2.


chown uses stat to:
* check for directories / symlinks / regular files
* Only change ownership on files with a specific existing ownership.
* Only change ownership if the requested owner does not match the
  current owner. 
* Different output when ownership is actually changed from when it's
  not necessary (in verbose mode).
* Reset S_UID, S_GID options after setting ownership in some cases.
but it seems the most recent version will not use stat for every file
with typical options:

chown -R rk kernel_old:
93.30% 463.84s  0.854ms  543337 lchown
 6.67%  33.18s  5.151ms    6442 getdents64
 0.01%   0.04s  0.036ms    1224 brk
 0.00%   0.02s  5.830ms       4 poll
 0.00%   0.02s  0.526ms      38 open


chmod needs stat to do things like "u+w", but the current implementation
uses stat regardless of if it's needed or not.
chmod -R o+w kernel_old:
62.50% 358.84s  0.660ms  543337 chmod
30.66% 176.05s  0.324ms  543336 lstat
 6.82%  39.17s  6.081ms    6442 getdents64
 0.01%   0.05s 54.515ms       1 execve
 0.01%   0.05s  0.037ms    1224 brk

chmod -R 0755 kernel_old:
61.21% 354.42s  0.652ms  543337 chmod
30.33% 175.61s  0.323ms  543336 lstat
 8.46%  48.96s  7.600ms    6442 getdents64
 0.01%   0.05s  0.037ms    1224 brk
 0.00%   0.01s 13.417ms       1 execve


Seems I was wrong about the imap servers. They (at least dovecot) do not
use a significant amount of time doing stat when opening folders:
84.90%  24.75s  13.137ms    1884 writev
11.23%   3.27s 204.675ms      16 poll
 0.95%   0.28s   0.023ms   11932 open
 0.89%   0.26s   0.022ms   12003 pread
 0.76%   0.22s  12.239ms      18 getdents64
 0.63%   0.18s   0.015ms   11942 close
 0.63%   0.18s   0.015ms   11936 fstat


I don't think any code inspection is needed to determine that rsync
requires stat of every file, regardless of d_type.

Initial rsync:
rsync -a kernel_old copy
78.23% 2914.59s  5.305ms  549452 read
 6.69%  249.17s  0.046ms 5462876 write
 4.82%  179.44s  0.330ms  543338 lstat
 4.57%  170.33s  0.313ms  543355 open
 4.13%  153.79s  0.028ms 5468732 select

rsync on identical directories:
rsync -a kernel_old copy
61.81% 189.27s  0.348ms  543338 lstat
25.23%  77.25s 15.917ms    4853 select
12.72%  38.94s  6.045ms    6442 getdents64
 0.19%   0.57s  0.118ms    4840 write
 0.03%   0.08s  3.736ms      22 open

tar cjgf incremental kernel_backup.tar kernel_old/
67.69% 2463.49s  3.030ms  812948 read
22.94% 834.85s  2.565ms  325471 write
 7.51% 273.45s  0.252ms 1086675 lstat
 0.94%  34.25s  2.658ms   12884 getdents64
 0.35%  12.63s  0.023ms  543370 open

incremental:
81.71% 171.62s  0.316ms  543342 lstat
16.81%  35.32s  2.741ms   12884 getdents64
 1.40%   2.94s  1.930ms    1523 write
 0.04%   0.09s 86.668ms       1 wait4
 0.02%   0.03s 34.300ms       1 execve




> And please an analysis which takes into 
> account that some programs might need to be adapted to take advantage of 
> d_type or non-optional data from the proposed statlite.


d_type may be useful in some cases, but I think mostly as a replacement
for the nlink==2 hacks for directory recursion. There are clearly many
stat-heavy examples that can not be optimized with d_type.


> Plus, how often are these commands really used on such filesystems?  I'd 
> hope that chown -R or so is a once in a lifetime thing on such 
> filesystems and not worth optimizing for.

I think you're right about chown/chmod beeing rare and should not be the
main focus. The other examples on my list is probably better. And they
are just examples - there are probably many many others as well.

And what do you mean by "such filesystems"? I know this came up in the
context of clustered filesystems, but unless I'm missing something
fundamentally here readdirplus could be just as useful on local
filesystems as clustered filesystems if it allowed parallel execution of
the getattrs.

-- 
Ragnar Kjørstad
Software Engineer
Scali - http://www.scali.com
Scaling the Linux Datacenter
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2006-12-17 14:49 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-28  4:34 NFSv4/pNFS possible POSIX I/O API standards Gary Grider
2006-11-28  5:54 ` Christoph Hellwig
2006-11-28 10:54   ` Andreas Dilger
2006-11-28 11:28     ` Anton Altaparmakov
2006-11-28 20:17     ` Russell Cattelan
2006-11-28 23:28     ` Wendy Cheng
2006-11-29  9:12       ` Christoph Hellwig
2006-11-29  9:04   ` Christoph Hellwig
2006-11-29  9:14     ` Christoph Hellwig
2006-11-29  9:48     ` Andreas Dilger
2006-11-29 10:18       ` Anton Altaparmakov
2006-11-29  8:26         ` Brad Boyer
2006-11-30  9:25           ` Christoph Hellwig
2006-11-30 17:49             ` Sage Weil
2006-12-01  5:26               ` Trond Myklebust
2006-12-01  7:08                 ` Sage Weil
2006-12-01 14:41                   ` Trond Myklebust
2006-12-01 16:47                     ` Sage Weil
2006-12-01 18:07                       ` Trond Myklebust
2006-12-01 18:42                         ` Sage Weil
2006-12-01 19:13                           ` Trond Myklebust
2006-12-01 20:32                             ` Sage Weil
2006-12-04 18:02                           ` Peter Staubach
2006-12-05 23:20                             ` readdirplus() as possible POSIX I/O API Sage Weil
2006-12-06 15:48                               ` Peter Staubach
2006-12-03  1:57                         ` NFSv4/pNFS possible POSIX I/O API standards Andreas Dilger
2006-12-03  7:34                           ` Kari Hurtta
2006-12-03  1:52                     ` Andreas Dilger
2006-12-03 16:10                       ` Sage Weil
2006-12-04  7:32                         ` Andreas Dilger
2006-12-04 15:15                           ` Trond Myklebust
2006-12-05  0:59                             ` Rob Ross
2006-12-05  4:44                               ` Gary Grider
2006-12-05 10:05                                 ` Christoph Hellwig
2006-12-05  5:56                               ` Trond Myklebust
2006-12-05 10:07                                 ` Christoph Hellwig
2006-12-05 14:20                                   ` Matthew Wilcox
2006-12-06 15:04                                     ` Rob Ross
2006-12-06 15:44                                       ` Matthew Wilcox
2006-12-06 16:15                                         ` Rob Ross
2006-12-05 14:55                                   ` Trond Myklebust
2006-12-05 22:11                                     ` Rob Ross
2006-12-05 23:24                                       ` Trond Myklebust
2006-12-06 16:42                                         ` Rob Ross
2006-12-06 12:22                                     ` Ragnar Kjørstad
2006-12-06 15:14                                       ` Trond Myklebust
2006-12-05 16:55                                   ` Latchesar Ionkov
2006-12-05 22:12                                     ` Christoph Hellwig
2006-12-06 23:12                                       ` Latchesar Ionkov
2006-12-06 23:33                                         ` Trond Myklebust
2006-12-05 21:50                                   ` Rob Ross
2006-12-05 22:05                                     ` Christoph Hellwig
2006-12-05 23:18                                       ` Sage Weil
2006-12-05 23:55                                       ` Ulrich Drepper
2006-12-06 10:06                                         ` Andreas Dilger
2006-12-06 17:19                                           ` Ulrich Drepper
2006-12-06 17:27                                             ` Rob Ross
2006-12-06 17:42                                               ` Ulrich Drepper
2006-12-06 18:01                                                 ` Ragnar Kjørstad
2006-12-06 18:13                                                   ` Ulrich Drepper
2006-12-17 14:41                                                     ` Ragnar Kjørstad [this message]
2006-12-17 19:07                                                       ` Ulrich Drepper
2006-12-17 19:38                                                         ` Matthew Wilcox
2006-12-17 21:51                                                           ` Ulrich Drepper
2006-12-18  2:57                                                             ` Ragnar Kjørstad
2006-12-18  3:54                                                               ` Gary Grider
2006-12-07  5:57                                                 ` Andreas Dilger
2006-12-15 22:37                                                   ` Ulrich Drepper
2006-12-16 18:13                                                     ` Andreas Dilger
2006-12-16 19:08                                                       ` Ulrich Drepper
2006-12-14 23:58                                         ` statlite() Rob Ross
2006-12-07 23:39                                       ` NFSv4/pNFS possible POSIX I/O API standards Nikita Danilov
2006-12-05 14:37                               ` Peter Staubach
2006-12-05 10:26                             ` readdirplus() as possible POSIX I/O API Andreas Dilger
2006-12-05 15:23                               ` Trond Myklebust
2006-12-06 10:28                                 ` Andreas Dilger
2006-12-06 15:10                                   ` Trond Myklebust
2006-12-05 17:06                               ` Latchesar Ionkov
2006-12-05 22:48                                 ` Rob Ross
2006-11-29 10:25       ` NFSv4/pNFS possible POSIX I/O API standards Steven Whitehouse
2006-11-30 12:29         ` Christoph Hellwig
2006-12-01 15:52       ` Ric Wheeler
2006-11-29 12:23     ` Matthew Wilcox
2006-11-29 12:35       ` Matthew Wilcox
2006-11-29 16:26         ` Gary Grider
2006-11-29 17:18           ` Christoph Hellwig
2006-11-29 12:39       ` Christoph Hellwig
2006-12-01 22:29         ` Rob Ross
2006-12-02  2:35           ` Latchesar Ionkov
2006-12-05  0:37             ` Rob Ross
2006-12-05 10:02               ` Christoph Hellwig
2006-12-05 16:47               ` Latchesar Ionkov
2006-12-05 17:01                 ` Matthew Wilcox
     [not found]                   ` <f158dc670612050909m366594c5ubaa87d9a9ecc8c2a@mail.gmail.com>
2006-12-05 17:10                     ` Latchesar Ionkov
2006-12-05 17:39                     ` Matthew Wilcox
2006-12-05 21:55                       ` Rob Ross
2006-12-05 21:50                   ` Peter Staubach
2006-12-05 21:44                 ` Rob Ross
2006-12-06 11:01                   ` openg Christoph Hellwig
2006-12-06 15:41                     ` openg Trond Myklebust
2006-12-06 15:42                     ` openg Rob Ross
2006-12-06 23:32                       ` openg Christoph Hellwig
2006-12-14 23:36                         ` openg Rob Ross
2006-12-06 23:25                   ` Re: NFSv4/pNFS possible POSIX I/O API standards Latchesar Ionkov
2006-12-06  9:48                 ` David Chinner
2006-12-06 15:53                   ` openg and path_to_handle Rob Ross
2006-12-06 16:04                     ` Matthew Wilcox
2006-12-06 16:20                       ` Rob Ross
2006-12-06 20:57                         ` David Chinner
2006-12-06 20:40                     ` David Chinner
2006-12-06 20:50                       ` Matthew Wilcox
2006-12-06 21:09                         ` David Chinner
2006-12-06 22:09                         ` Andreas Dilger
2006-12-06 22:17                           ` Matthew Wilcox
2006-12-06 22:41                             ` Andreas Dilger
2006-12-06 23:39                           ` Christoph Hellwig
2006-12-14 22:52                             ` Rob Ross
2006-12-06 20:50                       ` Rob Ross
2006-12-06 21:01                         ` David Chinner
2006-12-06 23:19                     ` Latchesar Ionkov
2006-12-14 21:00                       ` Rob Ross
2006-12-14 21:20                         ` Matthew Wilcox
2006-12-14 23:02                           ` Rob Ross
2006-11-28 15:08 ` NFSv4/pNFS possible POSIX I/O API standards Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061217144130.GN25199@vestdata.no \
    --to=kernel@ragnark.vestdata.no \
    --cc=aia21@cam.ac.uk \
    --cc=drepper@redhat.com \
    --cc=flar@allandria.com \
    --cc=ggrider@lanl.gov \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=rross@mcs.anl.gov \
    --cc=sage@newdream.net \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).