tabled test corpus?

All of lore.kernel.org
 help / color / mirror / Atom feed

* tabled test corpus?
@ 2010-03-05 15:31 Jeff Garzik
  2010-03-05 18:33 ` Jeff Garzik
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Garzik @ 2010-03-05 15:31 UTC (permalink / raw)
  To: Project Hail

Can anybody suggest a good test dataset for tabled?

Hopefully something with a million or more keys, where the values are large.

I can certainly generate something like that artificially, but a 
real-world dataset would be nice.

	Jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: tabled test corpus?
  2010-03-05 15:31 tabled test corpus? Jeff Garzik
@ 2010-03-05 18:33 ` Jeff Garzik
  2010-03-05 18:47   ` Fabian Deutsch
  2010-03-05 21:15   ` Colin McCabe
  0 siblings, 2 replies; 5+ messages in thread
From: Jeff Garzik @ 2010-03-05 18:33 UTC (permalink / raw)
  To: Project Hail; +Cc: Pete Zaitcev

On 03/05/2010 10:31 AM, Jeff Garzik wrote:
> Can anybody suggest a good test dataset for tabled?
>
> Hopefully something with a million or more keys, where the values are
> large.
>
> I can certainly generate something like that artificially, but a
> real-world dataset would be nice.

Still looking for a good, real-world data set.

A synthetic store+retrieve test of 1m keys @ 16K values worked without a 
hitch.  I documented this on 
http://hail.wiki.kernel.org/index.php/Extended_status

	Jeff



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: tabled test corpus?
  2010-03-05 18:33 ` Jeff Garzik
@ 2010-03-05 18:47   ` Fabian Deutsch
  2010-03-05 21:15   ` Colin McCabe
  1 sibling, 0 replies; 5+ messages in thread
From: Fabian Deutsch @ 2010-03-05 18:47 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Project Hail, Pete Zaitcev

Am Freitag, den 05.03.2010, 13:33 -0500 schrieb Jeff Garzik:
> On 03/05/2010 10:31 AM, Jeff Garzik wrote:
> > Can anybody suggest a good test dataset for tabled?
> >
> > Hopefully something with a million or more keys, where the values are
> > large.
> >
> > I can certainly generate something like that artificially, but a
> > real-world dataset would be nice.
> 
> Still looking for a good, real-world data set.

You might have a look at wikipedia's articles:
http://static.wikipedia.org/

> A synthetic store+retrieve test of 1m keys @ 16K values worked without a 
> hitch.  I documented this on 
> http://hail.wiki.kernel.org/index.php/Extended_status

Nice. :)

- fabian

> 	Jeff
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe hail-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: tabled test corpus?
  2010-03-05 18:33 ` Jeff Garzik
  2010-03-05 18:47   ` Fabian Deutsch
@ 2010-03-05 21:15   ` Colin McCabe
  2010-03-05 21:28     ` Jeff Garzik
  1 sibling, 1 reply; 5+ messages in thread
From: Colin McCabe @ 2010-03-05 21:15 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Project Hail, Pete Zaitcev

Random thoughts:

Maybe something like a freely available dictionary would work, with
the key as the word, and the value as the definition.

You could grab git commits from the Linux kernel and make the key the
SHA, and the value the patch.

There's a lot of text in Project Gutenberg. I guess you'd have to
decide what you want your average key / value lengths to be-- I think
most books there are longer than 16K. Maybe you could make the key
(book, page_number).

Colin

P.S. I've been meaning to set up a bigger tabled installation myself,
as soon as I get some time.

On Fri, Mar 5, 2010 at 10:33 AM, Jeff Garzik <jeff@garzik.org> wrote:
> On 03/05/2010 10:31 AM, Jeff Garzik wrote:
>>
>> Can anybody suggest a good test dataset for tabled?
>>
>> Hopefully something with a million or more keys, where the values are
>> large.
>>
>> I can certainly generate something like that artificially, but a
>> real-world dataset would be nice.
>
> Still looking for a good, real-world data set.
>
> A synthetic store+retrieve test of 1m keys @ 16K values worked without a
> hitch.  I documented this on
> http://hail.wiki.kernel.org/index.php/Extended_status
>
>        Jeff
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe hail-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: tabled test corpus?
  2010-03-05 21:15   ` Colin McCabe
@ 2010-03-05 21:28     ` Jeff Garzik
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Garzik @ 2010-03-05 21:28 UTC (permalink / raw)
  To: Colin McCabe; +Cc: Project Hail, Pete Zaitcev

On 03/05/2010 04:15 PM, Colin McCabe wrote:
> Random thoughts:
>
> Maybe something like a freely available dictionary would work, with
> the key as the word, and the value as the definition.
>
> You could grab git commits from the Linux kernel and make the key the
> SHA, and the value the patch.
>
> There's a lot of text in Project Gutenberg. I guess you'd have to
> decide what you want your average key / value lengths to be-- I think
> most books there are longer than 16K. Maybe you could make the key
> (book, page_number).

Yeah, I am definitely looking for something much larger than 16K.  S3 
values can run into the gigabytes per value...

	Jeff




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-03-05 21:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-05 15:31 tabled test corpus? Jeff Garzik
2010-03-05 18:33 ` Jeff Garzik
2010-03-05 18:47   ` Fabian Deutsch
2010-03-05 21:15   ` Colin McCabe
2010-03-05 21:28     ` Jeff Garzik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.