From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: chunkd self-check question Date: Thu, 03 Dec 2009 00:32:36 -0500 Message-ID: <4B174D74.8080407@garzik.org> References: <20091202205532.208edab6@redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=B4gCTIBw5bHLolLdaIu3NeF67O9cKoc/4mT72Eq857U=; b=NOd2G6pEbSvRedo3iteTkUYwh605mTTappHUl0EmOg2dY/0I1BUK4Jqv5WbcbVCswb byz4GzztrOn9mm7wccMoNRpcZeZiWn1nrLtTKlppys/7+lB2oTWz1J29a5bS1EpzZoUr o05a3xIM/cyO2WZoVwT/TOR8SSw153wA0nuHM= In-Reply-To: <20091202205532.208edab6@redhat.com> Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Pete Zaitcev Cc: Project Hail List On 12/02/2009 10:55 PM, Pete Zaitcev wrote: > I need a way to scan all objects that an Chunk node keeps. There's > a function that does it already: fs_list_objs. Looking at it, is there > a reason why it uses readdir instead of tchdbiternext? The TC database master.tch stores table data, not object data. master.tch is a (table name)->(table identifier) lookup table. The object "database" remains 100% filesystem-based, with a fixed-length metadata header prepended to each object. That means per-object lookup and retrieval is super-quick, with the kernel's pagecache and i/dcaches working hard for us. However, the list-objects operation requires that we open each object's file, and read the fixed-length metadata header. Objects not belonging to the authenticated user are then discarded from the list-objects output. A truly server-punishing operation -- opening and reading EVERY file's fixed length header -- but the thought was that list-objects would be so infrequent (once daily? once per cluster boot?) that it would not matter much. If that assumption turns out to be invalid or unwise, we can certainly change things (see below). > In case of > self-checking, scanning directories is undesirable, because if an > object somehow (e.g. a hardware failure) ends existing in filesystem > but without a corresponding entry in the TC database, it will incorrectly > count as present. I had considered storing object metadata in an additional TC database, for a couple reasons: much faster list-objects and object metadata retrieval, and storage of small objects. If some future chunkd stores object metadata in a TC database, yes, inconsistencies could arise. Jeff