From mboxrd@z Thu Jan 1 00:00:00 1970 From: MORITA Kazutaka Subject: Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM Date: Fri, 23 Oct 2009 19:41:03 +0900 Message-ID: <8fd1d76d0910230341w7978ac09te203ef34b79a86c6@mail.gmail.com> References: <4ADE988B.2070303@lab.ntt.co.jp> <4AE07A7F.8000002@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, linux-fsdevel@vger.kernel.org To: Avi Kivity Return-path: Received: from mail-yx0-f187.google.com ([209.85.210.187]:47925 "EHLO mail-yx0-f187.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750866AbZJWKk7 convert rfc822-to-8bit (ORCPT ); Fri, 23 Oct 2009 06:40:59 -0400 In-Reply-To: <4AE07A7F.8000002@redhat.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Fri, Oct 23, 2009 at 12:30 AM, Avi Kivity wrote: > On 10/21/2009 07:13 AM, MORITA Kazutaka wrote: >> >> Hi everyone, >> >> Sheepdog is a distributed storage system for KVM/QEMU. It provides >> highly available block level storage volumes to VMs like Amazon EBS. >> Sheepdog supports advanced volume management features such as snapsh= ot, >> cloning, and thin provisioning. Sheepdog runs on several tens or hun= dreds >> of nodes, and the architecture is fully symmetric; there is no centr= al >> node such as a meta-data server. > > Very interesting! =A0From a very brief look at the code, it looks lik= e the > sheepdog block format driver is a network client that is able to acce= ss > highly available images, yes? Yes. Sheepdog is a simple key-value storage system that consists of multiple nodes (a bit similar to Amazon Dynamo, I guess). The qemu Sheepdog driver (client) divides a VM image into fixed-size objects and store them on the key-value storage system. > If so, is it reasonable to compare this to a cluster file system setu= p (like > GFS) with images as files on this filesystem? =A0The difference would= be that > clustering is implemented in userspace in sheepdog, but in the kernel= for a > clustering filesystem. I think that the major difference between sheepdog and cluster file systems such as Google File system, pNFS, etc is the interface between clients and a storage system. > How is load balancing implemented? =A0Can you move an image transpare= ntly > while a guest is running? =A0Will an image be moved closer to its gue= st? Sheepdog uses consistent hashing to decide where objects store; I/O load is balanced across the nodes. When a new node is added or the existing node is removed, the hash table changes and the data automatically and transparently are moved over nodes. We plan to implement a mechanism to distribute the data not randomly but intelligently; we could use machine load, the locations of VMs, etc= =2E > Can you stripe an image across nodes? Yes, a VM images is divided into multiple objects, and they are stored over nodes. > Do you support multiple guests accessing the same image? A VM image can be attached to any VMs but one VM at a time; multiple running VMs cannot access to the same VM image. > What about fault tolerance - storing an image redundantly on multiple= nodes? Yes, all objects are replicated to multiple nodes. --=20 MORITA, Kazutaka NTT Cyber Space Labs OSS Computing Project Kernel Group E-mail: morita.kazutaka@lab.ntt.co.jp -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html