From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1N1Kub-0001An-65 for qemu-devel@nongnu.org; Fri, 23 Oct 2009 10:14:37 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1N1KuZ-00019x-Ps for qemu-devel@nongnu.org; Fri, 23 Oct 2009 10:14:36 -0400 Received: from [199.232.76.173] (port=60945 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N1KuZ-00019s-Ja for qemu-devel@nongnu.org; Fri, 23 Oct 2009 10:14:35 -0400 Received: from ey-out-1920.google.com ([74.125.78.147]:39633) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1N1KuW-0007fc-46 for qemu-devel@nongnu.org; Fri, 23 Oct 2009 10:14:34 -0400 Received: by ey-out-1920.google.com with SMTP id 3so5961452eyh.14 for ; Fri, 23 Oct 2009 07:14:30 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <8fd1d76d0910230341w7978ac09te203ef34b79a86c6@mail.gmail.com> References: <4ADE988B.2070303@lab.ntt.co.jp> <4AE07A7F.8000002@redhat.com> <8fd1d76d0910230341w7978ac09te203ef34b79a86c6@mail.gmail.com> Date: Fri, 23 Oct 2009 09:14:29 -0500 Message-ID: <90eb1dc70910230714h65e918a4n255bcf97634b26b0@mail.gmail.com> From: Javier Guerra Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: MORITA Kazutaka Cc: linux-fsdevel@vger.kernel.org, Avi Kivity , kvm@vger.kernel.org, qemu-devel@nongnu.org On Fri, Oct 23, 2009 at 5:41 AM, MORITA Kazutaka wrote: > On Fri, Oct 23, 2009 at 12:30 AM, Avi Kivity wrote: >> If so, is it reasonable to compare this to a cluster file system setup (= like >> GFS) with images as files on this filesystem? =C2=A0The difference would= be that >> clustering is implemented in userspace in sheepdog, but in the kernel fo= r a >> clustering filesystem. > > I think that the major difference between sheepdog and cluster file > systems such as Google File system, pNFS, etc is the interface between > clients and a storage system. note that GFS is "Global File System" (written by Sistina (the same folks from LVM) and bought by RedHat). Google Filesystem is a different thing, and ironically the client/storage interface is a little more like sheepdog and unlike a regular cluster filesystem. >> How is load balancing implemented? =C2=A0Can you move an image transpare= ntly >> while a guest is running? =C2=A0Will an image be moved closer to its gue= st? > > Sheepdog uses consistent hashing to decide where objects store; I/O > load is balanced across the nodes. When a new node is added or the > existing node is removed, the hash table changes and the data > automatically and transparently are moved over nodes. > > We plan to implement a mechanism to distribute the data not randomly > but intelligently; we could use machine load, the locations of VMs, etc. i don't have much hands-on experience on consistent hashing; but it sounds reasonable to make each node's ring segment proportional to its storage capacity. dynamic load balancing seems a tougher nut to crack, especially while keeping all clients mapping consistent. >> Do you support multiple guests accessing the same image? > > A VM image can be attached to any VMs but one VM at a time; multiple > running VMs cannot access to the same VM image. this is a must-have safety measure; but a 'manual override' is quite useful for those that know how to manage a cluster-aware filesystem inside a VM image, maybe like Xen's "w!" flag does. justs be sure to avoid distributed caching for a shared image! in all, great project, and with such a clean patch into KVM/Qemu, high hopes of making into regular use. i'd just want to add my '+1 votes' on both getting rid of JVM dependency and using block devices (usually LVM) instead of ext3/btrfs --=20 Javier