All of lore.kernel.org
 help / color / mirror / Atom feed
* 15M files
@ 2005-08-19 21:49 studdugie
  2005-08-19 22:08 ` PFC
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: studdugie @ 2005-08-19 21:49 UTC (permalink / raw)
  To: reiserfs-mailing-list

Hello. I'm looking to replace a couple Berkeley DB data stores w/
regular file system directories backed by reiserfs (3.6). The reason
is Berkeley DB is slow especially for data w/ little or no locality of
reference. I'm posting to this list because I would like to get some
opinions on if reiserfs is suitable for the job. Currently there are
15,079,597 records in 1 of the database. If I moved to a directory
based db it would result in 15,079,597 discreet files ranging in sizes
from 1 byte to 1kb. I was reading the FAQ on the namesys site and it
mentioned that the r5 hash supports 1,200,000 files w/o collision.
Since 15M is 12.5x greater I'm expecting massive amounts of
collisions. So the question becomes how bad should I expect it to be?
Should I assume the file system can handle it or slow to a crawl?  I
would really appreciate some feedback from the experts before I go
ripping out the Berkeley DB code.

Thanx.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 15M files
  2005-08-19 21:49 15M files studdugie
@ 2005-08-19 22:08 ` PFC
  2005-08-19 22:09 ` David Masover
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: PFC @ 2005-08-19 22:08 UTC (permalink / raw)
  To: studdugie, reiserfs-mailing-list


On Reiser4, I eperienced a really massive speedup when switching from  
berkeley to plain filesystem on my subversion repository.
But this is on reiser4. Try it ;)

Also you can really easily defragment your reiserfs database-directory :  
just tar and untar.


On Fri, 19 Aug 2005 23:49:40 +0200, studdugie <studdugie@gmail.com> wrote:

> Hello. I'm looking to replace a couple Berkeley DB data stores w/
> regular file system directories backed by reiserfs (3.6). The reason
> is Berkeley DB is slow especially for data w/ little or no locality of
> reference. I'm posting to this list because I would like to get some
> opinions on if reiserfs is suitable for the job. Currently there are
> 15,079,597 records in 1 of the database. If I moved to a directory
> based db it would result in 15,079,597 discreet files ranging in sizes
> from 1 byte to 1kb. I was reading the FAQ on the namesys site and it
> mentioned that the r5 hash supports 1,200,000 files w/o collision.
> Since 15M is 12.5x greater I'm expecting massive amounts of
> collisions. So the question becomes how bad should I expect it to be?
> Should I assume the file system can handle it or slow to a crawl?  I
> would really appreciate some feedback from the experts before I go
> ripping out the Berkeley DB code.
>
> Thanx.
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 15M files
  2005-08-19 21:49 15M files studdugie
  2005-08-19 22:08 ` PFC
@ 2005-08-19 22:09 ` David Masover
  2005-08-19 22:44   ` studdugie
  2005-08-19 22:30 ` Hans Reiser
  2005-08-21 19:16 ` Lexington Luthor
  3 siblings, 1 reply; 12+ messages in thread
From: David Masover @ 2005-08-19 22:09 UTC (permalink / raw)
  To: studdugie; +Cc: reiserfs-mailing-list


studdugie wrote:
> Hello. I'm looking to replace a couple Berkeley DB data stores w/
> regular file system directories backed by reiserfs (3.6).

Why 3.6, and not 4?  I'll bet v4 is better at this.

> I
> would really appreciate some feedback from the experts before I go
> ripping out the Berkeley DB code.

Others can answer the questions about whether the FS can handle it.  I
would suggest that you consider a real database, though, at least until
Reiser4 gets a good way (sys_reiser4?) of handling multiple files --
otherwise, you can expect to lose a lot of speed to all the open(2) calls.

I don't know anything about the status of sys_reiser4.  If it's almost
done, you may want to wait for it, but I don't think it's almost done.

Maybe MySQL would do the trick?  You may end up having to benchmark it
yourself...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 15M files
  2005-08-19 21:49 15M files studdugie
  2005-08-19 22:08 ` PFC
  2005-08-19 22:09 ` David Masover
@ 2005-08-19 22:30 ` Hans Reiser
  2005-08-19 22:39   ` studdugie
  2005-08-21 19:16 ` Lexington Luthor
  3 siblings, 1 reply; 12+ messages in thread
From: Hans Reiser @ 2005-08-19 22:30 UTC (permalink / raw)
  To: studdugie; +Cc: reiserfs-mailing-list

studdugie wrote:

>Hello. I'm looking to replace a couple Berkeley DB data stores w/
>regular file system directories backed by reiserfs (3.6). The reason
>is Berkeley DB is slow especially for data w/ little or no locality of
>reference. I'm posting to this list because I would like to get some
>opinions on if reiserfs is suitable for the job. Currently there are
>15,079,597 records in 1 of the database. If I moved to a directory
>based db it would result in 15,079,597 discreet files ranging in sizes
>from 1 byte to 1kb. I was reading the FAQ on the namesys site and it
>mentioned that the r5 hash supports 1,200,000 files w/o collision.
>Since 15M is 12.5x greater I'm expecting massive amounts of
>collisions. So the question becomes how bad should I expect it to be?
>Should I assume the file system can handle it or slow to a crawl?  I
>would really appreciate some feedback from the experts before I go
>ripping out the Berkeley DB code.
>
>Thanx.
>
>
>  
>
Use V4, it has much better hashiing.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 15M files
  2005-08-19 22:30 ` Hans Reiser
@ 2005-08-19 22:39   ` studdugie
  2005-08-19 22:45     ` David Masover
  2005-08-19 22:50     ` Hans Reiser
  0 siblings, 2 replies; 12+ messages in thread
From: studdugie @ 2005-08-19 22:39 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-mailing-list

I can't use V4 because I can't introduce an unstable kernel on the box
where the app is running.

On 8/19/05, Hans Reiser <reiser@namesys.com> wrote:
> studdugie wrote:
> 
> >Hello. I'm looking to replace a couple Berkeley DB data stores w/
> >regular file system directories backed by reiserfs (3.6). The reason
> >is Berkeley DB is slow especially for data w/ little or no locality of
> >reference. I'm posting to this list because I would like to get some
> >opinions on if reiserfs is suitable for the job. Currently there are
> >15,079,597 records in 1 of the database. If I moved to a directory
> >based db it would result in 15,079,597 discreet files ranging in sizes
> >from 1 byte to 1kb. I was reading the FAQ on the namesys site and it
> >mentioned that the r5 hash supports 1,200,000 files w/o collision.
> >Since 15M is 12.5x greater I'm expecting massive amounts of
> >collisions. So the question becomes how bad should I expect it to be?
> >Should I assume the file system can handle it or slow to a crawl?  I
> >would really appreciate some feedback from the experts before I go
> >ripping out the Berkeley DB code.
> >
> >Thanx.
> >
> >
> >
> >
> Use V4, it has much better hashiing.
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 15M files
  2005-08-19 22:09 ` David Masover
@ 2005-08-19 22:44   ` studdugie
  2005-08-20  9:58     ` Christian Iversen
  0 siblings, 1 reply; 12+ messages in thread
From: studdugie @ 2005-08-19 22:44 UTC (permalink / raw)
  To: David Masover; +Cc: reiserfs-mailing-list

MySQL is too slow for the task, even over localhost. I've benchmarked
it. Berkeley DB was 3x faster than MySQL for my usecase. The app is
written in Java so I would be forced to use JDBC to talk to MySQL. The
wrapping/unwrapping overhead is a killer.

On 8/19/05, David Masover <ninja@slaphack.com> wrote:
> 
> studdugie wrote:
> > Hello. I'm looking to replace a couple Berkeley DB data stores w/
> > regular file system directories backed by reiserfs (3.6).
> 
> Why 3.6, and not 4?  I'll bet v4 is better at this.
> 
> > I
> > would really appreciate some feedback from the experts before I go
> > ripping out the Berkeley DB code.
> 
> Others can answer the questions about whether the FS can handle it.  I
> would suggest that you consider a real database, though, at least until
> Reiser4 gets a good way (sys_reiser4?) of handling multiple files --
> otherwise, you can expect to lose a lot of speed to all the open(2) calls.
> 
> I don't know anything about the status of sys_reiser4.  If it's almost
> done, you may want to wait for it, but I don't think it's almost done.
> 
> Maybe MySQL would do the trick?  You may end up having to benchmark it
> yourself...
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 15M files
  2005-08-19 22:39   ` studdugie
@ 2005-08-19 22:45     ` David Masover
  2005-08-19 22:50     ` Hans Reiser
  1 sibling, 0 replies; 12+ messages in thread
From: David Masover @ 2005-08-19 22:45 UTC (permalink / raw)
  To: studdugie; +Cc: Hans Reiser, reiserfs-mailing-list



studdugie wrote:
> I can't use V4 because I can't introduce an unstable kernel on the box
> where the app is running.

Search the archives.  I wrote a sort of mini-howto on how to set up the
MM-kernel version of Reiser4 on a stable 2.6 kernel.  You don't even
need my patch if you do 2.6.13-rc6 with the latest MM patch (that you're
grabbing reiser4-specific stuff from)

That's if you trust Namesys that it's stable.  It's reasonably stable,
but maybe not mission-critical-stable yet, in that it hasn't been
hammered by millions of users yet.  But, it works for me.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 15M files
  2005-08-19 22:39   ` studdugie
  2005-08-19 22:45     ` David Masover
@ 2005-08-19 22:50     ` Hans Reiser
  1 sibling, 0 replies; 12+ messages in thread
From: Hans Reiser @ 2005-08-19 22:50 UTC (permalink / raw)
  To: studdugie; +Cc: reiserfs-mailing-list, vs

studdugie wrote:

>I can't use V4 because I can't introduce an unstable kernel on the box
>where the app is running.
>  
>
I suggest you ask vs to send you a patch for the stable kernel on
wednesday or so after we send our latest bundle to akpm (probably on
monday we will send it).

>On 8/19/05, Hans Reiser <reiser@namesys.com> wrote:
>  
>
>>studdugie wrote:
>>
>>    
>>
>>>Hello. I'm looking to replace a couple Berkeley DB data stores w/
>>>regular file system directories backed by reiserfs (3.6). The reason
>>>is Berkeley DB is slow especially for data w/ little or no locality of
>>>reference. I'm posting to this list because I would like to get some
>>>opinions on if reiserfs is suitable for the job. Currently there are
>>>15,079,597 records in 1 of the database. If I moved to a directory
>>>based db it would result in 15,079,597 discreet files ranging in sizes
>>>      
>>>
>>>from 1 byte to 1kb. I was reading the FAQ on the namesys site and it
>>    
>>
>>>mentioned that the r5 hash supports 1,200,000 files w/o collision.
>>>Since 15M is 12.5x greater I'm expecting massive amounts of
>>>collisions. So the question becomes how bad should I expect it to be?
>>>Should I assume the file system can handle it or slow to a crawl?  I
>>>would really appreciate some feedback from the experts before I go
>>>ripping out the Berkeley DB code.
>>>
>>>Thanx.
>>>
>>>
>>>
>>>
>>>      
>>>
>>Use V4, it has much better hashiing.
>>
>>    
>>
>
>
>  
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 15M files
  2005-08-19 22:44   ` studdugie
@ 2005-08-20  9:58     ` Christian Iversen
  2005-08-20 10:23       ` PFC
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Iversen @ 2005-08-20  9:58 UTC (permalink / raw)
  To: reiserfs-list

On Saturday 20 August 2005 00:44, studdugie wrote:
> MySQL is too slow for the task, even over localhost. I've benchmarked
> it. Berkeley DB was 3x faster than MySQL for my usecase. The app is
> written in Java so I would be forced to use JDBC to talk to MySQL. The
> wrapping/unwrapping overhead is a killer.

I've seen a java library that solves this exact problem - I can't remember 
what it was called, but it's page talked about massive speedups because it 
could somehow avoid the insane wrap-unwrap for each little object. Maybe it 
could help.

-- 
Regards,
Christian Iversen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 15M files
  2005-08-20  9:58     ` Christian Iversen
@ 2005-08-20 10:23       ` PFC
  2005-08-20 12:51         ` Christian Iversen
  0 siblings, 1 reply; 12+ messages in thread
From: PFC @ 2005-08-20 10:23 UTC (permalink / raw)
  To: Christian Iversen, reiserfs-list


Isn't this the RAM-based Prevalence ?

On Sat, 20 Aug 2005 11:58:59 +0200, Christian Iversen  
<chrivers@iversen-net.dk> wrote:

> On Saturday 20 August 2005 00:44, studdugie wrote:
>> MySQL is too slow for the task, even over localhost. I've benchmarked
>> it. Berkeley DB was 3x faster than MySQL for my usecase. The app is
>> written in Java so I would be forced to use JDBC to talk to MySQL. The
>> wrapping/unwrapping overhead is a killer.
>
> I've seen a java library that solves this exact problem - I can't  
> remember
> what it was called, but it's page talked about massive speedups because  
> it
> could somehow avoid the insane wrap-unwrap for each little object. Maybe  
> it
> could help.
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 15M files
  2005-08-20 10:23       ` PFC
@ 2005-08-20 12:51         ` Christian Iversen
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Iversen @ 2005-08-20 12:51 UTC (permalink / raw)
  To: reiserfs-list

On Saturday 20 August 2005 12:23, PFC wrote:
> Isn't this the RAM-based Prevalence ?

Yes, I believe that was it. 

-- 
Regards,
Christian Iversen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 15M files
  2005-08-19 21:49 15M files studdugie
                   ` (2 preceding siblings ...)
  2005-08-19 22:30 ` Hans Reiser
@ 2005-08-21 19:16 ` Lexington Luthor
  3 siblings, 0 replies; 12+ messages in thread
From: Lexington Luthor @ 2005-08-21 19:16 UTC (permalink / raw)
  To: reiserfs-list

studdugie wrote:
> Hello. I'm looking to replace a couple Berkeley DB data stores w/
> regular file system directories backed by reiserfs (3.6). The reason
> is Berkeley DB is slow especially for data w/ little or no locality of
> reference. I'm posting to this list because I would like to get some
> opinions on if reiserfs is suitable for the job. Currently there are
> 15,079,597 records in 1 of the database. If I moved to a directory
> based db it would result in 15,079,597 discreet files ranging in sizes
> from 1 byte to 1kb. I was reading the FAQ on the namesys site and it
> mentioned that the r5 hash supports 1,200,000 files w/o collision.
> Since 15M is 12.5x greater I'm expecting massive amounts of
> collisions. So the question becomes how bad should I expect it to be?
> Should I assume the file system can handle it or slow to a crawl?  I
> would really appreciate some feedback from the experts before I go
> ripping out the Berkeley DB code.
> 
> Thanx.
> 

Use the SkipDB library - I have a similar code in one of my programs, 
and SkipDB is almost 3x the speed of BDB (plus you don't lose control of 
transaction safety).

LL


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-08-21 19:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-19 21:49 15M files studdugie
2005-08-19 22:08 ` PFC
2005-08-19 22:09 ` David Masover
2005-08-19 22:44   ` studdugie
2005-08-20  9:58     ` Christian Iversen
2005-08-20 10:23       ` PFC
2005-08-20 12:51         ` Christian Iversen
2005-08-19 22:30 ` Hans Reiser
2005-08-19 22:39   ` studdugie
2005-08-19 22:45     ` David Masover
2005-08-19 22:50     ` Hans Reiser
2005-08-21 19:16 ` Lexington Luthor

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.