linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matti Aarnio <matti.aarnio@zmailer.org>
To: Xin Zhao <uszhaoxin@gmail.com>
Cc: mingz@ele.uri.edu, mikado4vn@gmail.com,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Question regarding to store file system metadata in database
Date: Mon, 20 Mar 2006 23:08:28 +0200	[thread overview]
Message-ID: <20060320210828.GG3927@mea-ext.zmailer.org> (raw)
In-Reply-To: <4ae3c140603201136q7e61963dy635bb2c6047f0bc2@mail.gmail.com>

On Mon, Mar 20, 2006 at 02:36:51PM -0500, Xin Zhao wrote:
> 
> OK. Now I have more experimental results.
> 
> After excluding the cost of reading file list and do stat(), the
> insertion rate becomes 587/sec, instead of 300/sec. The query rate is
> 2137/sec. I am runing mysql 4.1.11. FC4, 2.8G CPU and 1G mem.
> 
> 2137/sec seems to be good enough to handle pathname to inode
> resolving.  Anyone has some statistics how many file open in a busy
> file system?
> 
> Xin

What is wrong in here, I think, is your pre-set assumption, that
using proper modern database things will be faster.   Yes, perhaps
they will, under some specific conditions.

Like Gene Amdahl so long ago did point out, optimizing something
that forms 1% of the load will speed things up at most that 1%.

Could you instrument directory management primitive operations 
accounting ?  How many directory inserts/removes/lookups per
mounted filesystem (or entire system), including  dnames -cache
operations (they are already instrumented, I think) are used in
normal system operations ?

If your system behaviour shows more than 1% of other than lookups,
try to find out _why_ is that.

So far Linux optimizes filesystem directory reads to maximum.



Long ago I had a problem, where I needed insertion into an application
specific database from data origination system -- I needed also fast
batch replication from one dataset copy to another.  Doing hash keying
made insert _slow_.  Doing btree indexing and inserting in key-order
made things fast.   Not flushing database at every insert made it almost
linearly faster by the flush interval.  Not flushing the database except
at batch end made it maximally fast -- around 100 000 inserts per second,
but it had to be pre-sorted data.  (This was single SCSI-disk back in
1996.)  We had requirement to do batch insert as fast as possible,
similarly for batch replication that was used for maintenance, and
a ten-thousand-fold speedup was well worth the added complexities
in the software.

That database had also about 4 sigma "read-only" property.


 /Matti Aarnio

  parent reply	other threads:[~2006-03-20 21:08 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-19  4:48 Question regarding to store file system metadata in database Xin Zhao
2006-03-19  5:07 ` Mikado
2006-03-19 17:48   ` Xin Zhao
2006-03-19 17:58     ` Ming Zhang
2006-03-19 18:11       ` Xin Zhao
2006-03-19 18:26         ` Ming Zhang
2006-03-19 18:50           ` Xin Zhao
2006-03-19 19:47             ` Al Viro
     [not found]               ` <441DC2D6.4060001@societasilluminati.org>
2006-03-19 21:24                 ` Ming Zhang
2006-03-20 13:09               ` Theodore Ts'o
2006-03-20 15:13                 ` Xin Zhao
2006-03-20 19:36                   ` Xin Zhao
2006-03-20 19:58                     ` Al Viro
2006-03-20 22:53                       ` Xin Zhao
2006-03-20 23:32                         ` Al Viro
2006-03-20 21:08                     ` Matti Aarnio [this message]
2006-03-20 22:28                     ` Erez Zadok
2006-03-20 22:19                   ` Theodore Ts'o
2006-03-21  6:51                     ` Miklos Szeredi
2006-03-21 20:05                   ` Pavel Machek
2006-03-22 15:21                     ` Xin Zhao
2006-03-19 21:34             ` Ming Zhang
2006-03-20  8:30         ` Matti Aarnio
2006-03-19 23:06 ` Alan Cox
2006-03-19 23:44   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060320210828.GG3927@mea-ext.zmailer.org \
    --to=matti.aarnio@zmailer.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikado4vn@gmail.com \
    --cc=mingz@ele.uri.edu \
    --cc=uszhaoxin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).