From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pete Zaitcev Subject: Re: chunkd design genesis, storage tech, and support for multiple key/value tables Date: Tue, 10 Nov 2009 18:48:56 -0700 Message-ID: <20091110184856.5550a4ce@redhat.com> References: <20091110112409.GA31471@havoc.gtf.org> <20091110093322.313563c2@redhat.com> <4AF9C2EE.3040401@garzik.org> <4AF9C3F3.8050508@garzik.org> <4AF9D123.4090908@garzik.org> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4AF9D123.4090908@garzik.org> Sender: hail-devel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Jeff Garzik Cc: hail-devel@vger.kernel.org, zaitcev@redhat.com On Tue, 10 Nov 2009 15:46:27 -0500, Jeff Garzik wrote: > Now the world has figured out giving a storage device the flexibility to > manage data on a per-object granular basis simplifies applications, and > gives underlying storage more ability to optimize. This is a sleught of hand by OSD vendors, interested in selling for more dollars per gigabyte. > Thus, moving to generic key/value storage actually simplified > applications, by eliminating that mapping. You're so sure about this, I wonder where it comes from. The fact in case of tabled is, it must maintain a database of keys of its own, primarily because (a) it cannot afford round-trips into Chunk for every operation, and (b) to locate the chunks. Both of these databases may be in RAM, but it does not make them non-existing. > However, one glaring difference from SCSI OSD was chunkd's lack of > administrative partitions. SCSI OSDs provide "partitions" within each > logical unit (LUN), each of contains a set of objects within a single > object id namespace. Therefore, if you consider SCSI OSD object id as > the key, then SCSI OSD definitely has multiple key/value tables. This is a completely bogus analogy. OSD vendors want to push their wares into PC space, where one unit is all a computer has. But in the cloud we have thousands of Chunk nodes per each application. That is your partitioning right there: it's called . Look, I would not mind if all this partition stuff was free, but it's not. You decided to embed a partition into a session, so - There's a round trip that you excuse by telling applications to keep long-living connections, thanks a lot - requests to different partitions cannot be pipelined (well, not easily). -- Pete