From: Viji V Nair <viji@fedoraproject.org>
To: Theodore Tso <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>,
ext3-users@redhat.com, linux-ext4@vger.kernel.org
Subject: Re: optimising filesystem for many small files
Date: Sun, 18 Oct 2009 15:01:46 +0530 [thread overview]
Message-ID: <84c89ac10910180231p202fb5f1r2e192e9ac0b51509@mail.gmail.com> (raw)
In-Reply-To: <20091017222619.GA10074@mit.edu>
On Sun, Oct 18, 2009 at 3:56 AM, Theodore Tso <tytso@mit.edu> wrote:
> On Sat, Oct 17, 2009 at 11:26:04PM +0530, Viji V Nair wrote:
>> these files are not in a single directory, this is a pyramid
>> structure. There are total 15 pyramids and coming down from top to
>> bottom the sub directories and files are multiplied by a factor of 4.
>>
>> The IO is scattered all over!!!! and this is a single disk file system.
>>
>> Since the python application is creating files, it is creating
>> multiple files to multiple sub directories at a time.
>
> What is the application trying to do, at a high level? Sometimes it's
> not possible to optimize a filesystem against a badly designed
> application. :-(
The application is reading the gis data from a data source and
plotting the map tiles (256x256, png images) for different zoom
levels. The tree output of the first zoom level is as follows
/tiles/00
`-- 000
`-- 000
|-- 000
| `-- 000
| `-- 000
| |-- 000.png
| `-- 001.png
|-- 001
| `-- 000
| `-- 000
| |-- 000.png
| `-- 001.png
`-- 002
`-- 000
`-- 000
|-- 000.png
`-- 001.png
in each zoom level the fourth level directories are multiplied by a
factor of four. Also the number of png images are multiplied by the
same number.
>
> It sounds like it is generating files distributed in subdirectories in
> a completely random order. How are the files going to be read
> afterwards? In the order they were created, or some other order
> different from the order in which they were read?
The application which we are using are modified versions of mapnik and
tilecache, these are single threaded so we are running 4 process at a
time. We can say only four images are created at a single point of
time. Some times a single image is taking around 20 sec to create. I
can see lots of system resources are free, memory, processors etc
(these are 4G, 2 x 5420 XEON)
I have checked the delay in the backend data source, it is on a 12Gbps
LAN and no delay at all.
These images are also read in the same manner.
>
> With a sufficiently bad access patterns, there may not be a lot you
> can do, other than (a) throw hardware at the problem, or (b) fix or
> redesign the application to be more intelligent (if possible).
>
> - Ted
>
The file system is crated with "-i 1024 -b 1024" for larger inode
number, 50% of the total images are less than 10KB. I have disabled
access time and given a large value to the commit also. Do you have
any other recommendation of the file system creation?
Viji
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-10-18 9:31 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-17 6:52 optimising filesystem for many small files Viji V Nair
2009-10-17 14:32 ` Eric Sandeen
2009-10-17 17:56 ` Viji V Nair
2009-10-17 22:26 ` Theodore Tso
2009-10-18 9:31 ` Viji V Nair [this message]
2009-10-18 11:25 ` Jon Burgess
2009-10-18 12:51 ` Viji V Nair
2009-10-18 11:41 ` Matija Nalis
2009-10-18 13:08 ` Fwd: " Viji V Nair
2009-10-19 7:23 ` Stephen Samuel (gmail)
2009-10-18 13:14 ` Viji V Nair
2009-10-18 15:07 ` Jon Burgess
2009-10-18 16:29 ` Viji V Nair
2009-10-18 17:15 ` Jon Burgess
2009-10-18 14:15 ` Peter Grandi
2009-10-18 16:10 ` Viji V Nair
2009-10-18 15:34 ` Eric Sandeen
2009-10-18 16:33 ` Viji V Nair
-- strict thread matches above, loose matches on Subject: below --
2009-10-17 6:59 Viji V Nair
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84c89ac10910180231p202fb5f1r2e192e9ac0b51509@mail.gmail.com \
--to=viji@fedoraproject.org \
--cc=ext3-users@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).