linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@redhat.com>
To: Norbert Preining <preining@logic.at>
Cc: "Ted Ts'o" <tytso@mit.edu>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: Ext4 slow on links
Date: Wed, 20 Jun 2012 23:05:59 -0500	[thread overview]
Message-ID: <4FE29DA7.40405@redhat.com> (raw)
In-Reply-To: <20120621022818.GD9669@gamma.logic.tuwien.ac.at>

On 6/20/12 9:28 PM, Norbert Preining wrote:
> Hi Eric,
> 
> thanks a lot for looking into that.
> 
> On Mi, 20 Jun 2012, Eric Sandeen wrote:
>> so almost all reads, and no read merges; almost 35 megabytes read and every
>> one was a small 4k IO.
> 
> Ouch, that hurts.
> 
> On Mi, 20 Jun 2012, Eric Sandeen wrote:
>> Would you be willing to provide an "e2image -r" image of the filesystem?
> 
> Ok, it is running now since a few hours and I am far from finished
> I guess, since there are 350+G on the fs, and the compressed image
> is by now 200M.
> 
> Is it fine to do it on a running system, or do I have to boot
> from USB or so?

Well, don't bother, sorry.  See below.  Zach had it right.

> If it is not toooo big I will tr to upload it to some place were
> you can get access to.
> 
> On Mi, 20 Jun 2012, Eric Sandeen wrote:
>> Oh, but Zach Brown reminds me that if we stat the entries in getdents/hash
>> order, it's roughly random w.r.t. disk location.  Newer utils will sort into
>> inode order, I think(?)  Might be interesting to strace the ls -l and see
>> if it's doing it in inode order, or not.
> 
> Ok, is there a special option to strace, or -trace=all?

if you do 

# strace -v -o outfile ls -l 

you'll see things like:

getdents(3, {{d_ino=249052, d_off=186216735, d_reclen=32, d_name="file3"} {d_ino=245882, d_off=473549160, d_reclen=24, d_name="."} {d_ino=249051, d_off=516459536, d_reclen=32, d_name="file2"} {d_ino=249055, d_off=545762253, d_reclen=32, d_name="file6"} {d_ino=249049, d_off=550416647, d_reclen=32, d_name="file1"} ...

and from there see that the entries returned  are not in inode order (and therefore not in disk order).

and lstats after that, also out of order:

# grep lstat outfile
lstat("file3", {st_dev=makedev(8, 8), st_ino=249052, st_mode=S_IFLNK|0777, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=13, st_atime=2012/06/20-22:13:08, st_mtime=2012/06/20-22:13:07, st_ctime=2012/06/20-22:13:07}) = 0
lstat("file2", {st_dev=makedev(8, 8), st_ino=249051, st_mode=S_IFLNK|0777, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=13, st_atime=2012/06/20-22:13:08, st_mtime=2012/06/20-22:13:07, st_ctime=2012/06/20-22:13:07}) = 0
lstat("file6", {st_dev=makedev(8, 8), st_ino=249055, st_mode=S_IFLNK|0777, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=13, st_atime=2012/06/20-22:13:08, st_mtime=2012/06/20-22:13:07, st_ctime=2012/06/20-22:13:07}) = 0
lstat("file1", {st_dev=makedev(8, 8), st_ino=249049, st_mode=S_IFLNK|0777, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=13, st_atime=2012/06/20-22:13:08, st_mtime=2012/06/20-22:13:07, st_ctime=2012/06/20-22:13:07}) = 0
...

later on you'll see readlinks:

# grep readlink outfile
readlink("file3", "../dir2/file3", 14)  = 13
readlink("file2", "../dir2/file2", 14)  = 13
readlink("file6", "../dir2/file6", 14)  = 13
readlink("file1", "../dir2/file1", 14)  = 13
...

etc.

Hm.  Upstream coreutils fixed this for rm and some other ops:

http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a

# grep unlink /tmp/rm-strace 
unlink("file1")                         = 0
unlink("file10")                        = 0
unlink("file2")                         = 0
unlink("file3")                         = 0
unlink("file4")                         = 0
unlink("file5")                         = 0
unlink("file6")                         = 0
unlink("file7")                         = 0
unlink("file8")                         = 0
unlink("file9")                         = 0

but maybe not for ls -l

You could see if you could get this LD_PRELOAD working:

http://git.kernel.org/?p=fs/ext2/e2fsprogs.git;a=blob_plain;f=contrib/spd_readdir.c

build & enable with:

gcc -o spd_readdir.so -fPIC -shared spd_readdir.c -ldl
export LD_PRELOAD=`pwd`/spd_readdir.so

and see if that addresses the problem; 

here, it does for me:

# grep readlink outfile2 
readlink("file1", "../dir2/file1"..., 14) = 13
readlink("file10", "../dir2/file10"..., 15) = 14
readlink("file2", "../dir2/file2"..., 14) = 13
readlink("file3", "../dir2/file3"..., 14) = 13
readlink("file4", "../dir2/file4"..., 14) = 13
readlink("file5", "../dir2/file5"..., 14) = 13

I'm guessing that operating in inode order should help
you a bit, at least.  I tested on a dir w/ 10,000 long symlinks
with and without the sorting, and you can see the difference pretty
clearly.

sorted took 2.6s, unsorted took 52s.

And you can see why:

http://people.redhat.com/esandeen/sorted_unsorted.png

meanwhile I can ask Jim about coreutils & ls -l.

-Eric

> Best wishes
> 
> Norbert

  reply	other threads:[~2012-06-21  4:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-20  0:20 Ext4 slow on links Norbert Preining
2012-06-20  2:19 ` Ted Ts'o
2012-06-20  3:38   ` Norbert Preining
2012-06-20  3:57     ` Eric Sandeen
2012-06-20  4:01       ` Norbert Preining
2012-06-20  5:18       ` Norbert Preining
2012-06-20 14:07         ` Eric Sandeen
2012-06-20 19:35       ` Eric Sandeen
2012-06-21  2:28         ` Norbert Preining
2012-06-21  4:05           ` Eric Sandeen [this message]
2012-06-21  4:50             ` Norbert Preining
2012-06-21  5:18               ` Andreas Dilger
2012-06-21  6:55                 ` Norbert Preining
2012-06-22  9:53             ` Bernd Schubert
2012-06-22 14:08               ` Ted Ts'o
2012-06-20  3:15 ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FE29DA7.40405@redhat.com \
    --to=sandeen@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=preining@logic.at \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).