linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Viji V Nair <viji@fedoraproject.org>
To: Eric Sandeen <sandeen@redhat.com>
Cc: Theodore Tso <tytso@mit.edu>,
	ext3-users@redhat.com, linux-ext4@vger.kernel.org
Subject: Re: optimising filesystem for many small files
Date: Sun, 18 Oct 2009 22:03:42 +0530	[thread overview]
Message-ID: <84c89ac10910180933p3ddb9947ye464a19ba29e4ccc@mail.gmail.com> (raw)
In-Reply-To: <4ADB357B.4030008@redhat.com>

On Sun, Oct 18, 2009 at 9:04 PM, Eric Sandeen <sandeen@redhat.com> wrote:
> Viji V Nair wrote:
>>
>> On Sun, Oct 18, 2009 at 3:56 AM, Theodore Tso <tytso@mit.edu> wrote:
>>>
>>> On Sat, Oct 17, 2009 at 11:26:04PM +0530, Viji V Nair wrote:
>>>>
>>>> these files are not in a single directory, this is a pyramid
>>>> structure. There are total 15 pyramids and coming down from top to
>>>> bottom the sub directories and files  are multiplied by a factor of 4.
>>>>
>>>> The IO is scattered all over!!!! and this is a single disk file system.
>>>>
>>>> Since the python application is creating files, it is creating
>>>> multiple files to multiple sub directories at a time.
>>>
>>> What is the application trying to do, at a high level?  Sometimes it's
>>> not possible to optimize a filesystem against a badly designed
>>> application.  :-(
>>
>> The application is reading the gis data from a data source and
>> plotting the map tiles (256x256, png images) for different zoom
>> levels. The tree output of the first zoom level is as follows
>>
>> /tiles/00
>> `-- 000
>>    `-- 000
>>        |-- 000
>>        |   `-- 000
>>        |       `-- 000
>>        |           |-- 000.png
>>        |           `-- 001.png
>>        |-- 001
>>        |   `-- 000
>>        |       `-- 000
>>        |           |-- 000.png
>>        |           `-- 001.png
>>        `-- 002
>>            `-- 000
>>                `-- 000
>>                    |-- 000.png
>>                    `-- 001.png
>>
>> in each zoom level the fourth level directories are multiplied by a
>> factor of four. Also the number of png images are multiplied by the
>> same number.
>>>
>>> It sounds like it is generating files distributed in subdirectories in
>>> a completely random order.  How are the files going to be read
>>> afterwards?  In the order they were created, or some other order
>>> different from the order in which they were read?
>>
>> The application which we are using are modified versions of mapnik and
>> tilecache, these are single threaded so we are running 4 process at a
>> time. We can say only four images are created at a single point of
>> time. Some times a single image is taking around 20 sec to create. I
>> can see lots of system resources are free, memory, processors etc
>> (these are 4G, 2 x 5420 XEON)
>>
>> I have checked the delay in the backend data source, it is on a 12Gbps
>> LAN and no delay at all.
>
> The delays are almost certainly due to the drive heads seeking like mad as
> they attempt to write data all over the disk; most filesystems are designed
> so that files in subdirectories are kept together, and new subdirectories
> are placed at relatively distant locations to make room for the files they
> will contain.
>
> In the past I've seen similar applications also slow down due to new inode
> searching heuristics in the inode allocator, but that was on ext3 and ext4
> is significantly different in that regard...
>
>> These images are also read in the same manner.
>>
>>> With a sufficiently bad access patterns, there may not be a lot you
>>> can do, other than (a) throw hardware at the problem, or (b) fix or
>>> redesign the application to be more intelligent (if possible).
>>>
>>>                                                   - Ted
>>>
>>
>> The file system is crated with "-i 1024 -b 1024" for larger inode
>> number, 50% of the total images are less than 10KB. I have disabled
>> access time and given a large value to the commit also. Do you have
>> any other recommendation of the file system creation?
>
> I think you'd do better to change, if possible, how the application behaves.
>
> I probably don't know enough about the app but rather than:
>
> /tiles/00
> `-- 000
>    `-- 000
>        |-- 000
>        |   `-- 000
>        |       `-- 000
>        |           |-- 000.png
>        |           `-- 001.png
>
> could it do:
>
> /tiles/00/000000000000000000.png
> /tiles/00/000000000000000001.png
>
> ...
>
> for example?  (or something similar)
>
> -Eric

The tilecache application is creating these directory structure, we
need to change it and our application for a new directory tree.

>
>> Viji
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2009-10-18 16:33 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-17  6:52 optimising filesystem for many small files Viji V Nair
2009-10-17 14:32 ` Eric Sandeen
2009-10-17 17:56   ` Viji V Nair
2009-10-17 22:26     ` Theodore Tso
2009-10-18  9:31       ` Viji V Nair
2009-10-18 11:25         ` Jon Burgess
2009-10-18 12:51           ` Viji V Nair
2009-10-18 11:41         ` Matija Nalis
2009-10-18 13:08           ` Fwd: " Viji V Nair
2009-10-19  7:23             ` Stephen Samuel (gmail)
2009-10-18 13:14           ` Viji V Nair
2009-10-18 15:07             ` Jon Burgess
2009-10-18 16:29               ` Viji V Nair
2009-10-18 17:15                 ` Jon Burgess
2009-10-18 14:15         ` Peter Grandi
2009-10-18 16:10           ` Viji V Nair
2009-10-18 15:34         ` Eric Sandeen
2009-10-18 16:33           ` Viji V Nair [this message]
  -- strict thread matches above, loose matches on Subject: below --
2009-10-17  6:59 Viji V Nair

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84c89ac10910180933p3ddb9947ye464a19ba29e4ccc@mail.gmail.com \
    --to=viji@fedoraproject.org \
    --cc=ext3-users@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).