From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: [Patch 5/7] tabled: Add replication daemon Date: Sun, 15 Nov 2009 05:47:15 -0500 Message-ID: <4AFFDC33.8060607@garzik.org> References: <20091113233605.673bfb9e@redhat.com> <4AFF40DA.5020505@garzik.org> <20091114215318.7b1c4eea@redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20091114215318.7b1c4eea@redhat.com> Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Pete Zaitcev Cc: Project Hail List On 11/14/2009 11:53 PM, Pete Zaitcev wrote: > On Sat, 14 Nov 2009 18:44:26 -0500, Jeff Garzik wrote: > >> 1) What is the point of db->del() in rep_add_nid() ? You are in the >> middle of a transaction, and you immediately overwrite that record in >> the same transaction. That is clearly unnecessary work, when db->put() >> will simply overwrite an existing record, if requested. > > That didn't work last I tested, but I forgot what the issue was. > The code is copied from object.c. I'll review it. The only code matching that pattern is object_put_end(), which also performs unnecessary work of needlessly calling db->del(). This should be recognized as nothing more than programmer laziness, enabled by the sharing of __object_del() between DEL and PUT, and the necessity of deleting associated records (ACLs) in conjunction with the main [deletion | update]. That does not apply to rep_add_nid(), which has neither object data nor related records to delete. (patches accepted to remove ->del from PUT path, too, if you wish) >> 2) rep_scan(): I would rather not make the entire daemon non-responsive >> for multiple seconds. >> >> The database is already in multi-threaded mode, and db4 is >> free-threaded, so it would seem to make a lot more sense to simply >> g_thread_create() a thread to do this work. > > True, but multi-threading in tabled is a much bigger undertaking than > just "simply" creating a thread. There's a lot of common state. > I think the 2s problem is less urgent than processing massive > replication more efficiently (from a Chunk down). I only see one global variable reference in the whole nicely-designed, nicely self-contained replica.c file. "a lot of common state" seems like quite an exaggeration. This patch resuscitates the worst application model from the late 1980's, cooperative multi-tasking. I do not want to walk down that road: such programs have weird pauses at weird times, and behave in a very non-deterministic manner. Furthermore, I think you will find that threading gives you a lot more freedom to work in the background. >> Also, there should be no need to scan a chunkd's keys at all, as long as >> the chunkd instance is still communicating with us. > > No, there's a need, although there's no code for it: the scan should > probe keys by requesting their metadata, which forturously includes > checksums. This is how the results of Chunk's self-test will be > communicated to tabled. What is the need? This description provides "what" and "how" but does not answer "why?" Jeff