All of lore.kernel.org
 help / color / mirror / Atom feed
* Design challenges in chunkd self-checking
@ 2009-12-22 21:41 Pete Zaitcev
  2009-12-22 22:43 ` Jeff Garzik
  0 siblings, 1 reply; 9+ messages in thread
From: Pete Zaitcev @ 2009-12-22 21:41 UTC (permalink / raw)
  To: Project Hail List

I'm looking into adding self-checking to chunkd. This involves basically
a process that re-reads everything stored in the chunkserver and verifies
that it's still ok. Nothing can be simpler, right?

So, current problems for which I'd like input are:

 - Scheduling and deconflicting with normal operation.

   Run "genisofs" in your Fedora desktop and your Firefox is DEAD.
   It is also the reason why everyone does rpm -e mlocate the first thing
   after the installation. The effect of massive data access blowing
   away caches is very drastic in a regular Linux.
   So, I have to have a good way to keep self-checkig from interfering
   with normal service of a chunkserver.
   Also, need to save power instead of burning it on re-reading data.

 - Consistency.

   Returning wrong checksums for an object that is being updated may
   lead to us deciding to drop a perfectly good object, which is
   unacceptable (especially when redundancy is impaired already).
   So, I need some kind of locking, or logging, or invalidation...

-- Pete

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Design challenges in chunkd self-checking
  2009-12-22 21:41 Design challenges in chunkd self-checking Pete Zaitcev
@ 2009-12-22 22:43 ` Jeff Garzik
  2009-12-23  1:40   ` Pete Zaitcev
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2009-12-22 22:43 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: Project Hail List

On 12/22/2009 04:41 PM, Pete Zaitcev wrote:
> I'm looking into adding self-checking to chunkd. This involves basically
> a process that re-reads everything stored in the chunkserver and verifies
> that it's still ok. Nothing can be simpler, right?
>
> So, current problems for which I'd like input are:
>
>   - Scheduling and deconflicting with normal operation.
>
>     Run "genisofs" in your Fedora desktop and your Firefox is DEAD.
>     It is also the reason why everyone does rpm -e mlocate the first thing
>     after the installation. The effect of massive data access blowing
>     away caches is very drastic in a regular Linux.
>     So, I have to have a good way to keep self-checkig from interfering
>     with normal service of a chunkserver.
>     Also, need to save power instead of burning it on re-reading data.

The problem seems to revolve around two variables:

* last-checked time.  You wouldn't want to check a single individual 
object more than once every N hours|days|weeks.

* maximum bytes-per-second.  You wouldn't want to exceed a useful bound 
for throughput.

Perhaps the last variable could be calculated by observing disk 
throughput over time, in conjunction with the number of objects and 
their sizes, resulting in an idea of the total time required to check 
the entire dataset.

And if we start keeping data like this, we might want to move metadata 
from the beginning of each object to a TC database.  That might speed up 
fs_list_objs and a couple other operations, too.


>   - Consistency.
>
>     Returning wrong checksums for an object that is being updated may
>     lead to us deciding to drop a perfectly good object, which is
>     unacceptable (especially when redundancy is impaired already).
>     So, I need some kind of locking, or logging, or invalidation...

It is normal and reasonable to maintain global information about all 
in-progress operations.  Caching systems do that, for example, to ensure 
multiple cache requests for object A do not initiate multiple 
simultaneous back-end requests for object A.

For the purposes of verification, I would just skip objects that are 
actively being written-to.  Those are, by definition, too new to 
probably need verification anyway.

BTW, in case this is helpful, chunkd's backend writes a zeroed metadata 
header to the beginning of each object.  The metadata header is only 
updated with "real" values after the final data byte is written.

	Jeff


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Design challenges in chunkd self-checking
  2009-12-22 22:43 ` Jeff Garzik
@ 2009-12-23  1:40   ` Pete Zaitcev
  2009-12-23  3:36     ` Jeff Garzik
  0 siblings, 1 reply; 9+ messages in thread
From: Pete Zaitcev @ 2009-12-23  1:40 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Project Hail List

On Tue, 22 Dec 2009 17:43:58 -0500
Jeff Garzik <jeff@garzik.org> wrote:

> It is normal and reasonable to maintain global information about all 
> in-progress operations.  Caching systems do that, for example, to ensure 
> multiple cache requests for object A do not initiate multiple 
> simultaneous back-end requests for object A.

Unfortunately, this requires a data structure that permits searching.
Since I lack the classical CS education, I cannot select an appropriate
structure beyond "double-linked list and a hope that we'll never see
more than 10 simultaneous I/Os".

-- Pete

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Design challenges in chunkd self-checking
  2009-12-23  1:40   ` Pete Zaitcev
@ 2009-12-23  3:36     ` Jeff Garzik
  2010-01-05 20:47       ` Pete Zaitcev
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2009-12-23  3:36 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: Project Hail List

On 12/22/2009 08:40 PM, Pete Zaitcev wrote:
> On Tue, 22 Dec 2009 17:43:58 -0500
> Jeff Garzik<jeff@garzik.org>  wrote:
>
>> It is normal and reasonable to maintain global information about all
>> in-progress operations.  Caching systems do that, for example, to ensure
>> multiple cache requests for object A do not initiate multiple
>> simultaneous back-end requests for object A.
>
> Unfortunately, this requires a data structure that permits searching.
> Since I lack the classical CS education, I cannot select an appropriate
> structure beyond "double-linked list and a hope that we'll never see
> more than 10 simultaneous I/Os".

What are the operational parameters?

* at beginning of object write, add entry to ADT (abstract data type, a 
list/table/whatever)

* at end of object write, remove entry from ADT

* lookup entry in ADT, where search key == object id.  if search 
succeeds, that object is guaranteed to be young and not need 
verification.  if search fails, the object (a) does not exist, or (b) 
has been completely written to disk.

Seems like a mutex-wrapped GLib hash table would work...

	Jeff



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Design challenges in chunkd self-checking
  2009-12-23  3:36     ` Jeff Garzik
@ 2010-01-05 20:47       ` Pete Zaitcev
  2010-01-05 21:02         ` Jeff Garzik
  0 siblings, 1 reply; 9+ messages in thread
From: Pete Zaitcev @ 2010-01-05 20:47 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Project Hail List

On Tue, 22 Dec 2009 22:36:16 -0500
Jeff Garzik <jeff@garzik.org> wrote:

> Seems like a mutex-wrapped GLib hash table would work...

I dunno about this... See, I think it's like kernel timers: there's a
lot of premium on having add and remove quick, and the rest is whatever.
The important part is not to penalize the latency of normal requests
only to make self-checking faster. That process takes hours to loop
anyway, maybe days.

I went with a list for now.

-- Pete

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Design challenges in chunkd self-checking
  2010-01-05 20:47       ` Pete Zaitcev
@ 2010-01-05 21:02         ` Jeff Garzik
  2010-01-05 21:39           ` Pete Zaitcev
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2010-01-05 21:02 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: Project Hail List

On 01/05/2010 03:47 PM, Pete Zaitcev wrote:
> On Tue, 22 Dec 2009 22:36:16 -0500
> Jeff Garzik<jeff@garzik.org>  wrote:
>
>> Seems like a mutex-wrapped GLib hash table would work...
>
> I dunno about this... See, I think it's like kernel timers: there's a
> lot of premium on having add and remove quick, and the rest is whatever.
> The important part is not to penalize the latency of normal requests
> only to make self-checking faster. That process takes hours to loop
> anyway, maybe days.
>
> I went with a list for now.

How is an O(n) list faster than an O(1) hash table?

	Jeff



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Design challenges in chunkd self-checking
  2010-01-05 21:02         ` Jeff Garzik
@ 2010-01-05 21:39           ` Pete Zaitcev
  2010-01-05 21:53             ` Jeff Garzik
  0 siblings, 1 reply; 9+ messages in thread
From: Pete Zaitcev @ 2010-01-05 21:39 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Project Hail List

On Tue, 05 Jan 2010 16:02:58 -0500
Jeff Garzik <jeff@garzik.org> wrote:
> On 01/05/2010 03:47 PM, Pete Zaitcev wrote:
> > On Tue, 22 Dec 2009 22:36:16 -0500
> > Jeff Garzik<jeff@garzik.org>  wrote:
> >
> >> Seems like a mutex-wrapped GLib hash table would work...
> >
> > I dunno about this... See, I think it's like kernel timers: there's a
> > lot of premium on having add and remove quick, and the rest is whatever.
> > The important part is not to penalize the latency of normal requests
> > only to make self-checking faster. That process takes hours to loop
> > anyway, maybe days.
> >
> > I went with a list for now.
> 
> How is an O(n) list faster than an O(1) hash table?

Do you know what the constant is in that hash table (which is not
O(1) in case of conflicts)?

Notice that the glib's hash table does NOT include a hash function.
This is something I wanted to discuss too. What would you use?
I researched what's available, and all of them come with rather
weird licenses (well, ok, Google hash is under the new BSD, which
works for us... or does it?).

The code is supposed to be easy to change over to any other
lookup structure, hopefuly anyway.

-- Pete

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Design challenges in chunkd self-checking
  2010-01-05 21:39           ` Pete Zaitcev
@ 2010-01-05 21:53             ` Jeff Garzik
  2010-01-05 22:10               ` Pete Zaitcev
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2010-01-05 21:53 UTC (permalink / raw)
  To: Pete Zaitcev; +Cc: Project Hail List

On 01/05/2010 04:39 PM, Pete Zaitcev wrote:
> On Tue, 05 Jan 2010 16:02:58 -0500
> Jeff Garzik<jeff@garzik.org>  wrote:
>> On 01/05/2010 03:47 PM, Pete Zaitcev wrote:
>>> On Tue, 22 Dec 2009 22:36:16 -0500
>>> Jeff Garzik<jeff@garzik.org>   wrote:
>>>
>>>> Seems like a mutex-wrapped GLib hash table would work...
>>>
>>> I dunno about this... See, I think it's like kernel timers: there's a
>>> lot of premium on having add and remove quick, and the rest is whatever.
>>> The important part is not to penalize the latency of normal requests
>>> only to make self-checking faster. That process takes hours to loop
>>> anyway, maybe days.
>>>
>>> I went with a list for now.
>>
>> How is an O(n) list faster than an O(1) hash table?
>
> Do you know what the constant is in that hash table (which is not
> O(1) in case of conflicts)?

The GLib implementation does its best to ensure each hash slot contains 
either zero or one entries, resizing the hash table if need be, as 
entries are inserted.

It is O(1) except in extremely pathological cases.


> Notice that the glib's hash table does NOT include a hash function.
> This is something I wanted to discuss too. What would you use?
> I researched what's available, and all of them come with rather
> weird licenses (well, ok, Google hash is under the new BSD, which
> works for us... or does it?).

If you have a constant pointer value [for the lifetime of the hash table 
entry], use g_direct_hash.  If you have a nul-terminated string, GLib 
also has g_str_hash.

Otherwise, I would pick something simple like djb's hash.  There is a 
version in nfs4d/main.c that you can grab.  djb put his code into the 
public domain, which makes licensing easy.

	Jeff




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Design challenges in chunkd self-checking
  2010-01-05 21:53             ` Jeff Garzik
@ 2010-01-05 22:10               ` Pete Zaitcev
  0 siblings, 0 replies; 9+ messages in thread
From: Pete Zaitcev @ 2010-01-05 22:10 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Project Hail List

On Tue, 05 Jan 2010 16:53:55 -0500
Jeff Garzik <jeff@garzik.org> wrote:

> If you have a constant pointer value [for the lifetime of the hash table 
> entry], use g_direct_hash.  If you have a nul-terminated string, GLib 
> also has g_str_hash.

Of course I considered these, but thanks to our keys being a random
byte string now (remember whose idea it was?), and not having all
keys listed somewhere (obviously), neither works.

> Otherwise, I would pick something simple like djb's hash.  There is a 
> version in nfs4d/main.c that you can grab.  djb put his code into the 
> public domain, which makes licensing easy.

I'll give it a thought, thanks. I thought I liked Hsui's SuperFastHash,
but oh well.

-- Pete

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-01-05 22:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-22 21:41 Design challenges in chunkd self-checking Pete Zaitcev
2009-12-22 22:43 ` Jeff Garzik
2009-12-23  1:40   ` Pete Zaitcev
2009-12-23  3:36     ` Jeff Garzik
2010-01-05 20:47       ` Pete Zaitcev
2010-01-05 21:02         ` Jeff Garzik
2010-01-05 21:39           ` Pete Zaitcev
2010-01-05 21:53             ` Jeff Garzik
2010-01-05 22:10               ` Pete Zaitcev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.