linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Viji V Nair <viji@fedoraproject.org>
To: Jon Burgess <jburgess777@googlemail.com>
Cc: Matija Nalis <mnalis-ml@voyager.hr>,
	linux-ext4@vger.kernel.org, ext3-users@redhat.com
Subject: Re: optimising filesystem for many small files
Date: Sun, 18 Oct 2009 21:59:12 +0530	[thread overview]
Message-ID: <84c89ac10910180929t2bebfd3eq26eb318475a24fd4@mail.gmail.com> (raw)
In-Reply-To: <1255878457.27380.138.camel@localhost.localdomain>

On Sun, Oct 18, 2009 at 8:37 PM, Jon Burgess <jburgess777@googlemail.com> wrote:
> On Sun, 2009-10-18 at 18:44 +0530, Viji V Nair wrote:
>> On Sun, Oct 18, 2009 at 5:11 PM, Matija Nalis <mnalis-ml@voyager.hr> wrote:
>> > On Sun, Oct 18, 2009 at 03:01:46PM +0530, Viji V Nair wrote:
>> >> The application which we are using are modified versions of mapnik and
>> >> tilecache, these are single threaded so we are running 4 process at a
>> >
>> > How does it scale if you reduce the number or processes - especially if you
>> > run just one of those ? As this is just a single disk, 4 simultaneous
>> > readers/writers would probably *totally* kill it with seeks.
>> >
>> > I suspect it might even run faster with just 1 process then with 4 of
>> > them...
>>
>> with one process it is giving me 6 seconds
>
> That seems a little slow. Have you looked in optimising your mapnik
> setup? The mapnik-users list or IRC channel is a good place to ask[1].
>
> For comparison, the OpenStreetMap tile server typically renders a 8x8
> block of 64 tiles in about 1 second, although the time varies greatly
> depending on the amount of data within the tiles.
>
>> >
>> >> time. We can say only four images are created at a single point of
>> >> time. Some times a single image is taking around 20 sec to create. I
>> >
>> > is that 20 secs just the write time for an precomputed file of 10k ?
>> > Or does it also include reading and processing and writing ?
>>
>> this include processing and writing
>>
>> >
>> >> can see lots of system resources are free, memory, processors etc
>> >> (these are 4G, 2 x 5420 XEON)
>
> 4GB may be a little small. Have you checked whether the IO reading your
> data sources is the bottleneck?

I will be upgrading the RAM, but I didn't see any swap usage while
running this applications...
the data source is on a different machine, postgres+postgis. I have
checked the IO, looks fine. It is a 50G DB running on 16GB dual xeon
box

>
>> > If you can modify hardware setup, RAID10 (better with many smaller disks
>> > than with fewer bigger ones) should help *very* much. Flash-disk-thingies of
>> > appropriate size are even better option (as the seek issues are few orders
>> > of magnitude smaller problem). Also probably more RAM (unless you full
>> > dataset is much smaller than 2 GB, which I doubt).
>> >
>> > On the other hand, have you tried testing some other filesystems ?
>> > I've had much better performance with lots of small files of XFS (but that
>> > was on big RAID5, so YMMV), for example.
>> >
>> > --
>> > Opinions above are GNU-copylefted.
>> >
>>
>> I have not tried XFS, but tried reiserfs. I could not see a large
>> difference when compared with mkfs.ext4 -T small. I could see that
>> reiser is giving better performance on overwrite, not on new writes.
>> some times we overwrite existing image with new ones.
>>
>> Now the total files are 50Million, soon (with in an year) it will grow
>> to 1 Billion. I know that we should move ahead with the hardware
>> upgrades, also files system access is a large concern for us. There
>> images are accessed over the internet and expecting a 100 million
>> visits every month. For each user we need to transfer at least 3Mb of
>> data.
>
> Serving 3MB is about 1000 tiles. This is a total of 100M * 1000 = 1e11
> tiles/month or about 40,000 requests per second. If every request needed
> an IO from a hard disk managing 100 IOPs then you would need about 400
> disks. Having a decent amount of RAM should dramatically cut the number
> of request reaching the disks. Alternatively you might be able to do
> this all with just a few SSDs. The Intel X25-E is rated at >35,000 IOPs
> for random 4kB reads[2].
>
> I can give you some performance numbers about the OSM server for
> comparision: At last count the OSM tile server had 568M tiles cached
> using about 500GB of disk space[3]. The hardware is described on the
> wiki[4]. It regularly serves 500+ tiles per second @ 50Mbps[5]. This is
> about 40 million HTTP requests per day and several TB of traffic per
> month.
>
>        Jon
>
>
> 1: http://trac.mapnik.org/
> 2: http://download.intel.com/design/flash/nand/extreme/extreme-sata-ssd-product-brief.pdf
> 3: http://wiki.openstreetmap.org/wiki/Tile_Disk_Usage
> 4: http://wiki.openstreetmap.org/wiki/Servers/yevaud
> 5: http://munin.openstreetmap.org/openstreetmap/yevaud.openstreetmap.html
>
>
>

I have to give a try on mod_tile. Do you have any suggestion on using
nginx/varnish as a cahce layer?
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2009-10-18 16:29 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-17  6:52 optimising filesystem for many small files Viji V Nair
2009-10-17 14:32 ` Eric Sandeen
2009-10-17 17:56   ` Viji V Nair
2009-10-17 22:26     ` Theodore Tso
2009-10-18  9:31       ` Viji V Nair
2009-10-18 11:25         ` Jon Burgess
2009-10-18 12:51           ` Viji V Nair
2009-10-18 11:41         ` Matija Nalis
2009-10-18 13:08           ` Fwd: " Viji V Nair
2009-10-19  7:23             ` Stephen Samuel (gmail)
2009-10-18 13:14           ` Viji V Nair
2009-10-18 15:07             ` Jon Burgess
2009-10-18 16:29               ` Viji V Nair [this message]
2009-10-18 17:15                 ` Jon Burgess
2009-10-18 14:15         ` Peter Grandi
2009-10-18 16:10           ` Viji V Nair
2009-10-18 15:34         ` Eric Sandeen
2009-10-18 16:33           ` Viji V Nair
  -- strict thread matches above, loose matches on Subject: below --
2009-10-17  6:59 Viji V Nair

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84c89ac10910180929t2bebfd3eq26eb318475a24fd4@mail.gmail.com \
    --to=viji@fedoraproject.org \
    --cc=ext3-users@redhat.com \
    --cc=jburgess777@googlemail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=mnalis-ml@voyager.hr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).