qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
To: Avi Kivity <avi@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org,
	kvm@vger.kernel.org
Subject: [Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Date: Fri, 23 Oct 2009 19:41:03 +0900	[thread overview]
Message-ID: <8fd1d76d0910230341w7978ac09te203ef34b79a86c6@mail.gmail.com> (raw)
In-Reply-To: <4AE07A7F.8000002@redhat.com>

On Fri, Oct 23, 2009 at 12:30 AM, Avi Kivity <avi@redhat.com> wrote:
> On 10/21/2009 07:13 AM, MORITA Kazutaka wrote:
>>
>> Hi everyone,
>>
>> Sheepdog is a distributed storage system for KVM/QEMU. It provides
>> highly available block level storage volumes to VMs like Amazon EBS.
>> Sheepdog supports advanced volume management features such as snapshot,
>> cloning, and thin provisioning. Sheepdog runs on several tens or hundreds
>> of nodes, and the architecture is fully symmetric; there is no central
>> node such as a meta-data server.
>
> Very interesting!  From a very brief look at the code, it looks like the
> sheepdog block format driver is a network client that is able to access
> highly available images, yes?

Yes. Sheepdog is a simple key-value storage system that
consists of multiple nodes (a bit similar to Amazon Dynamo, I guess).

The qemu Sheepdog driver (client) divides a VM image into fixed-size
objects and store them on the key-value storage system.

> If so, is it reasonable to compare this to a cluster file system setup (like
> GFS) with images as files on this filesystem?  The difference would be that
> clustering is implemented in userspace in sheepdog, but in the kernel for a
> clustering filesystem.

I think that the major difference between sheepdog and cluster file
systems such as Google File system, pNFS, etc is the interface between
clients and a storage system.

> How is load balancing implemented?  Can you move an image transparently
> while a guest is running?  Will an image be moved closer to its guest?

Sheepdog uses consistent hashing to decide where objects store; I/O
load is balanced across the nodes. When a new node is added or the
existing node is removed, the hash table changes and the data
automatically and transparently are moved over nodes.

We plan to implement a mechanism to distribute the data not randomly
but intelligently; we could use machine load, the locations of VMs, etc.

> Can you stripe an image across nodes?

Yes, a VM images is divided into multiple objects, and they are
stored over nodes.

> Do you support multiple guests accessing the same image?

A VM image can be attached to any VMs but one VM at a time; multiple
running VMs cannot access to the same VM image.

> What about fault tolerance - storing an image redundantly on multiple nodes?

Yes, all objects are replicated to multiple nodes.


-- 
MORITA, Kazutaka

NTT Cyber Space Labs
OSS Computing Project
Kernel Group
E-mail: morita.kazutaka@lab.ntt.co.jp

  parent reply	other threads:[~2009-10-23 10:41 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-21  5:13 [Qemu-devel] [ANNOUNCE] Sheepdog: Distributed Storage System for KVM MORITA Kazutaka
2009-10-21  8:28 ` [Qemu-devel] " Nikolai K. Bochev
2009-10-21  8:45 ` Nikolai K. Bochev
2009-10-23  9:59   ` MORITA Kazutaka
2009-10-21  9:08 ` [Qemu-devel] " Dietmar Maurer
2009-10-23 10:06   ` [Qemu-devel] " MORITA Kazutaka
2009-10-23 10:17     ` Chris Webb
2009-10-23 10:26       ` Chris Webb
2009-10-23 11:10     ` [Qemu-devel] " Dietmar Maurer
2009-10-23 11:45       ` Dietmar Maurer
2009-10-22 15:30 ` [Qemu-devel] " Avi Kivity
2009-10-22 16:28   ` Anthony Liguori
2009-10-22 22:09     ` Alexander Graf
2009-10-23 10:41   ` MORITA Kazutaka [this message]
2009-10-23 11:10     ` Alexander Graf
2009-10-23 16:17       ` MORITA Kazutaka
2009-10-23 14:14     ` Javier Guerra
2009-10-23 14:58       ` Chris Webb
2009-10-23 15:10         ` Javier Guerra
2009-10-23 17:05         ` Tomasz Chmielewski
2009-10-25  8:44           ` Dietmar Maurer
2009-10-25 10:55             ` Tomasz Chmielewski
2009-10-23 15:40       ` FUJITA Tomonori
2009-10-25  5:36         ` Avi Kivity
2009-10-25  8:51       ` [Qemu-devel] " Dietmar Maurer
2009-10-26  6:53         ` [Qemu-devel] " MORITA Kazutaka
2009-10-22 18:46 ` Avishay Traeger
2009-10-23 11:22 ` [Qemu-devel] " Dietmar Maurer
2009-10-23 19:39 ` [Qemu-devel] " MORITA Kazutaka
2009-10-23 19:45   ` Javier Guerra
2009-10-24  2:49     ` MORITA Kazutaka
2009-10-28  3:53 ` [Qemu-devel] " MORITA Kazutaka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8fd1d76d0910230341w7978ac09te203ef34b79a86c6@mail.gmail.com \
    --to=morita.kazutaka@lab.ntt.co.jp \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).