From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1N1Ha1-0000Rv-Pj for qemu-devel@nongnu.org; Fri, 23 Oct 2009 06:41:09 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1N1HZx-0000Of-US for qemu-devel@nongnu.org; Fri, 23 Oct 2009 06:41:09 -0400 Received: from [199.232.76.173] (port=36767 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N1HZx-0000OX-Oa for qemu-devel@nongnu.org; Fri, 23 Oct 2009 06:41:05 -0400 Received: from mail-yw0-f176.google.com ([209.85.211.176]:49936) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1N1HZw-0005S0-F7 for qemu-devel@nongnu.org; Fri, 23 Oct 2009 06:41:05 -0400 Received: by ywh6 with SMTP id 6so7323926ywh.4 for ; Fri, 23 Oct 2009 03:41:03 -0700 (PDT) MIME-Version: 1.0 Sender: morita.kazutaka@gmail.com In-Reply-To: <4AE07A7F.8000002@redhat.com> References: <4ADE988B.2070303@lab.ntt.co.jp> <4AE07A7F.8000002@redhat.com> Date: Fri, 23 Oct 2009 19:41:03 +0900 Message-ID: <8fd1d76d0910230341w7978ac09te203ef34b79a86c6@mail.gmail.com> From: MORITA Kazutaka Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org, kvm@vger.kernel.org On Fri, Oct 23, 2009 at 12:30 AM, Avi Kivity wrote: > On 10/21/2009 07:13 AM, MORITA Kazutaka wrote: >> >> Hi everyone, >> >> Sheepdog is a distributed storage system for KVM/QEMU. It provides >> highly available block level storage volumes to VMs like Amazon EBS. >> Sheepdog supports advanced volume management features such as snapshot, >> cloning, and thin provisioning. Sheepdog runs on several tens or hundred= s >> of nodes, and the architecture is fully symmetric; there is no central >> node such as a meta-data server. > > Very interesting! =A0From a very brief look at the code, it looks like th= e > sheepdog block format driver is a network client that is able to access > highly available images, yes? Yes. Sheepdog is a simple key-value storage system that consists of multiple nodes (a bit similar to Amazon Dynamo, I guess). The qemu Sheepdog driver (client) divides a VM image into fixed-size objects and store them on the key-value storage system. > If so, is it reasonable to compare this to a cluster file system setup (l= ike > GFS) with images as files on this filesystem? =A0The difference would be = that > clustering is implemented in userspace in sheepdog, but in the kernel for= a > clustering filesystem. I think that the major difference between sheepdog and cluster file systems such as Google File system, pNFS, etc is the interface between clients and a storage system. > How is load balancing implemented? =A0Can you move an image transparently > while a guest is running? =A0Will an image be moved closer to its guest? Sheepdog uses consistent hashing to decide where objects store; I/O load is balanced across the nodes. When a new node is added or the existing node is removed, the hash table changes and the data automatically and transparently are moved over nodes. We plan to implement a mechanism to distribute the data not randomly but intelligently; we could use machine load, the locations of VMs, etc. > Can you stripe an image across nodes? Yes, a VM images is divided into multiple objects, and they are stored over nodes. > Do you support multiple guests accessing the same image? A VM image can be attached to any VMs but one VM at a time; multiple running VMs cannot access to the same VM image. > What about fault tolerance - storing an image redundantly on multiple nod= es? Yes, all objects are replicated to multiple nodes. --=20 MORITA, Kazutaka NTT Cyber Space Labs OSS Computing Project Kernel Group E-mail: morita.kazutaka@lab.ntt.co.jp