From: Phillip Susi <psusi@cfl.rr.com>
To: "linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Large directories and poor order correlation
Date: Mon, 14 Mar 2011 16:24:48 -0400 [thread overview]
Message-ID: <4D7E7990.90209@cfl.rr.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 2692 bytes --]
Shouldn't copying or extracting or otherwise populating a large
directory of many small files at the same time result in a strong
correlation between the order the names appear in the directory, and the
order their data blocks are stored on disk, and thus, read performance
should not be negatively impacted by fragmentation?
Background:
While migrating a server to a new system, I noticed that it was taking
forever to rsync my Maildir. It seemed to be due to very low disk
throughput due to seeking. I confirmed this with timing tests using
both tar and dump to /dev/zero to test reading the files after dropping
cache. I noticed that tar was horribly slow, and dump was much better.
I surmise that this was due to poor correlation between the order of
file names in the directory and their data blocks on disk. I would
expect this on an old fs that has grown slowly over a few years, and
that this would mostly go away after copying the files to a new system.
I found some surprises. The big one being that after copying the files
to the new system, they still have a poor correlation between directory
and inode order.
Details:
The old system was a single disk with sequential throughput of 51 mb/s,
and the new one is a 4 disk raid-5 with sequential throughput of 160 mb/s.
On the old system, tar took 30 minutes, and dump took 8 minutes. On the
new system, tar took 18 minutes, and dump took a mere 30 seconds!
On just the linux-kernel Maildir, which has 85,364 files taking up 660M
of space, dump on the old system clocks in at 11m41s and only 10s on the
new system.
I wrote a python script to actually measure the correlation between name
and inode order, inode and data block order, and name to data block
order. It enumerates the files and counts it as being either in or out
of order by comparing the inode number to the last. I expected to see a
much better correlation on the new system, but I did not.
On the new system the linux-kernel Maildir gets these results:
Name to inode correlation: 0.50002342908
Name to block correlation: 0.49996485638
Inode to block correlation: 0.889239023476
And on the old system:
Name to inode correlation: 0.499531418397
Name to block correlation: 0.499554847477
Inode to block correlation: 0.987418583946
The other folders get similar results. You can see that the inode to
block correlation is improved, but it wasn't very bad to begin with so
going from 8 minutes to 30 seconds seems to be a good deal more
improvement than this would explain. What concerns me though, is the
name to inode correlation went from terrible to slightly worse, which is
why tar still is horribly slow.
Attaching the script for reference.
[-- Attachment #2: correlation.py --]
[-- Type: text/x-python, Size: 1390 bytes --]
#!/usr/bin/python
import os
from stat import *
import fcntl
import array
names = os.listdir('.')
lastino = 0
name_to_ino_in = 0
name_to_ino_out = 0
lastblock = 0
name_to_block_in = 0
name_to_block_out = 0
iblocks = list()
inode_to_block_in = 0
inode_to_block_out = 0
for file in names :
try :
st = os.stat(file)
except OSError:
continue
if not S_ISREG(st.st_mode) :
continue
if st.st_ino > lastino :
name_to_ino_in += 1
else : name_to_ino_out += 1
lastino = st.st_ino
f = open(file)
buf = array.array('I', [0])
err = fcntl.ioctl(f.fileno(), 1, buf)
if err != 0 :
print "ioctl failed on " + f
block = buf[0]
if block != 0 :
if block > lastblock :
name_to_block_in += 1
else : name_to_block_out += 1
lastblock = block
iblocks.append((st.st_ino,block))
print "Name to inode correlation: " + str(float(name_to_ino_in) / float((name_to_ino_in + name_to_ino_out)))
print "Name to block correlation: " + str(float(name_to_block_in) / float((name_to_block_in + name_to_block_out)))
iblocks.sort()
lastblock = 0
for i in iblocks:
if i[1] > lastblock:
inode_to_block_in += 1
else: inode_to_block_out += 1
lastblock = i[1]
print "Inode to block correlation: " + str(float(inode_to_block_in) / float((inode_to_block_in + inode_to_block_out)))
next reply other threads:[~2011-03-14 20:24 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-14 20:24 Phillip Susi [this message]
2011-03-14 20:37 ` Large directories and poor order correlation Eric Sandeen
2011-03-14 20:52 ` Phillip Susi
2011-03-14 21:12 ` Eric Sandeen
2011-03-14 21:52 ` Ted Ts'o
2011-03-14 23:43 ` Phillip Susi
2011-03-15 0:14 ` Ted Ts'o
2011-03-15 14:01 ` Phillip Susi
2011-03-15 14:33 ` Rogier Wolff
2011-03-15 14:36 ` Ric Wheeler
2011-03-15 17:08 ` Ted Ts'o
2011-03-15 19:08 ` Phillip Susi
2011-03-16 1:50 ` Ted Ts'o
2011-03-15 7:59 ` Florian Weimer
2011-03-15 11:06 ` Theodore Tso
2011-03-15 11:23 ` Ric Wheeler
2011-03-15 11:38 ` Theodore Tso
2011-03-15 13:33 ` Rogier Wolff
2011-03-15 17:18 ` Ted Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D7E7990.90209@cfl.rr.com \
--to=psusi@cfl.rr.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).