From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lexington Luthor Subject: Re: 15M files Date: Sun, 21 Aug 2005 20:16:27 +0100 Message-ID: References: <5a59ce53050819144970d07ab0@mail.gmail.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com In-Reply-To: <5a59ce53050819144970d07ab0@mail.gmail.com> Sender: news List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: reiserfs-list@namesys.com studdugie wrote: > Hello. I'm looking to replace a couple Berkeley DB data stores w/ > regular file system directories backed by reiserfs (3.6). The reason > is Berkeley DB is slow especially for data w/ little or no locality of > reference. I'm posting to this list because I would like to get some > opinions on if reiserfs is suitable for the job. Currently there are > 15,079,597 records in 1 of the database. If I moved to a directory > based db it would result in 15,079,597 discreet files ranging in sizes > from 1 byte to 1kb. I was reading the FAQ on the namesys site and it > mentioned that the r5 hash supports 1,200,000 files w/o collision. > Since 15M is 12.5x greater I'm expecting massive amounts of > collisions. So the question becomes how bad should I expect it to be? > Should I assume the file system can handle it or slow to a crawl? I > would really appreciate some feedback from the experts before I go > ripping out the Berkeley DB code. > > Thanx. > Use the SkipDB library - I have a similar code in one of my programs, and SkipDB is almost 3x the speed of BDB (plus you don't lose control of transaction safety). LL