All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pete Zaitcev <zaitcev@redhat.com>
To: Project Hail List <hail-devel@vger.kernel.org>
Subject: Data redundancy in tabled
Date: Thu, 29 Oct 2009 17:40:33 -0600	[thread overview]
Message-ID: <20091029174033.0203175c@redhat.com> (raw)

I have considered how I want to implement the first cut of data redundancy
in tabled, and came with the following plan:

 - Checksum for data is to be stored in both the db4 entry and
   the metadata in chunk.

 - tabled gets a new "thread" (probably not really a thread)
   that walks keys in db4 by record number, then asks all listed
   chunkservers to verify if the OID is present (actually asks
   to fetch the metadata). If not present, it updates the db4.
   If the reported checksum mismatches the one in db4, it tells
   chunkserver to drop that object (optionally).

   If in the end the object's redundancy is insufficient,
   replication is scheduled.

 - chunkservers run checksums by themselves, without a command
   from tabled, verify against checksums in the metadata that
   they have. If object fails, it's removed (or maked dead,
   I haven't decided yet).

 - a separate process running at tabled processes scheduled
   replications. The usual applies: control bandwidth and
   prevent hogging up the network and servers, batch up OIDs
   between same chunkservers, etc.

This all looks nicely parallel except for the db4 scanner,
thanks to decoupling of checksumming from tabled's affairs.
I decided not to do anything about the db4 at this stage, because
any improvements need a new database. I have a vague plan for
that as well, but it must wait. The above is what will make
tabled useable for the general public.

-- Pete

                 reply	other threads:[~2009-10-29 23:40 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091029174033.0203175c@redhat.com \
    --to=zaitcev@redhat.com \
    --cc=hail-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.