From: Eric Sandeen <sandeen@redhat.com>
To: Viji V Nair <viji@fedoraproject.org>
Cc: linux-ext4@vger.kernel.org, Theodore Tso <tytso@mit.edu>,
ext3-users@redhat.com
Subject: Re: optimising filesystem for many small files
Date: Sun, 18 Oct 2009 10:34:19 -0500 [thread overview]
Message-ID: <4ADB357B.4030008@redhat.com> (raw)
In-Reply-To: <84c89ac10910180231p202fb5f1r2e192e9ac0b51509@mail.gmail.com>
Viji V Nair wrote:
> On Sun, Oct 18, 2009 at 3:56 AM, Theodore Tso <tytso@mit.edu> wrote:
>> On Sat, Oct 17, 2009 at 11:26:04PM +0530, Viji V Nair wrote:
>>> these files are not in a single directory, this is a pyramid
>>> structure. There are total 15 pyramids and coming down from top to
>>> bottom the sub directories and files are multiplied by a factor of 4.
>>>
>>> The IO is scattered all over!!!! and this is a single disk file system.
>>>
>>> Since the python application is creating files, it is creating
>>> multiple files to multiple sub directories at a time.
>> What is the application trying to do, at a high level? Sometimes it's
>> not possible to optimize a filesystem against a badly designed
>> application. :-(
>
> The application is reading the gis data from a data source and
> plotting the map tiles (256x256, png images) for different zoom
> levels. The tree output of the first zoom level is as follows
>
> /tiles/00
> `-- 000
> `-- 000
> |-- 000
> | `-- 000
> | `-- 000
> | |-- 000.png
> | `-- 001.png
> |-- 001
> | `-- 000
> | `-- 000
> | |-- 000.png
> | `-- 001.png
> `-- 002
> `-- 000
> `-- 000
> |-- 000.png
> `-- 001.png
>
> in each zoom level the fourth level directories are multiplied by a
> factor of four. Also the number of png images are multiplied by the
> same number.
>> It sounds like it is generating files distributed in subdirectories in
>> a completely random order. How are the files going to be read
>> afterwards? In the order they were created, or some other order
>> different from the order in which they were read?
>
> The application which we are using are modified versions of mapnik and
> tilecache, these are single threaded so we are running 4 process at a
> time. We can say only four images are created at a single point of
> time. Some times a single image is taking around 20 sec to create. I
> can see lots of system resources are free, memory, processors etc
> (these are 4G, 2 x 5420 XEON)
>
> I have checked the delay in the backend data source, it is on a 12Gbps
> LAN and no delay at all.
The delays are almost certainly due to the drive heads seeking like mad
as they attempt to write data all over the disk; most filesystems are
designed so that files in subdirectories are kept together, and new
subdirectories are placed at relatively distant locations to make room
for the files they will contain.
In the past I've seen similar applications also slow down due to new
inode searching heuristics in the inode allocator, but that was on ext3
and ext4 is significantly different in that regard...
> These images are also read in the same manner.
>
>> With a sufficiently bad access patterns, there may not be a lot you
>> can do, other than (a) throw hardware at the problem, or (b) fix or
>> redesign the application to be more intelligent (if possible).
>>
>> - Ted
>>
>
> The file system is crated with "-i 1024 -b 1024" for larger inode
> number, 50% of the total images are less than 10KB. I have disabled
> access time and given a large value to the commit also. Do you have
> any other recommendation of the file system creation?
I think you'd do better to change, if possible, how the application behaves.
I probably don't know enough about the app but rather than:
/tiles/00
`-- 000
`-- 000
|-- 000
| `-- 000
| `-- 000
| |-- 000.png
| `-- 001.png
could it do:
/tiles/00/000000000000000000.png
/tiles/00/000000000000000001.png
...
for example? (or something similar)
-Eric
> Viji
next prev parent reply other threads:[~2009-10-18 15:34 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-17 6:52 optimising filesystem for many small files Viji V Nair
2009-10-17 14:32 ` Eric Sandeen
2009-10-17 17:56 ` Viji V Nair
2009-10-17 22:26 ` Theodore Tso
2009-10-18 9:31 ` Viji V Nair
2009-10-18 11:25 ` Jon Burgess
2009-10-18 12:51 ` Viji V Nair
2009-10-18 11:41 ` Matija Nalis
2009-10-18 13:08 ` Fwd: " Viji V Nair
2009-10-19 7:23 ` Stephen Samuel (gmail)
2009-10-18 13:14 ` Viji V Nair
2009-10-18 15:07 ` Jon Burgess
2009-10-18 16:29 ` Viji V Nair
2009-10-18 17:15 ` Jon Burgess
2009-10-18 14:15 ` Peter Grandi
2009-10-18 16:10 ` Viji V Nair
2009-10-18 15:34 ` Eric Sandeen [this message]
2009-10-18 16:33 ` Viji V Nair
-- strict thread matches above, loose matches on Subject: below --
2009-10-17 6:59 Viji V Nair
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ADB357B.4030008@redhat.com \
--to=sandeen@redhat.com \
--cc=ext3-users@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=viji@fedoraproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).