All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: jim owens <jowens@hp.com>
Cc: Stephan von Krawczynski <skraw@ithnet.com>, linux-btrfs@vger.kernel.org
Subject: Re: Some very basic questions
Date: Wed, 22 Oct 2008 16:25:05 +0200	[thread overview]
Message-ID: <48FF37C1.1010001@redhat.com> (raw)
In-Reply-To: <48FF3520.2000308@hp.com>

jim owens wrote:
> Avi Kivity wrote:
>> jim owens wrote:
>>>
>>> Remember that the device bandwidth is the limiter so even
>>> when each host has a dedicated path to the device (as in
>>> dual port SAS or FC), that 2nd host cuts the throughput by
>>> more than 1/2 with uncoordinated seeks and transfers.
>>
>> That's only a problem if there is a single shared device.  Since 
>> btrfs supports multiple devices, each host could own a device set and 
>> access from other hosts would be through the owner.  You would need 
>> RDMA to get reasonable performance and some kind of dual-porting to 
>> get high availability.  Each host could control the allocation tree 
>> for its devices.
>
> No.  Every device including a monster $$$ array has the problem.
>
> As I said before, unless the application is partitioned
> there is always data host2 needs from host1's disk and that
> slows down host1.

The CPU load should not be significant if you have RDMA.  Or are you 
talking about the seek load?  Since host1's load should be distributed 
over all devices in the system, overall seek capacity increases as you 
add more nodes.

>
> If host2 seldom needs any host1 data, then you are describing
> a configuration that can be done easily by each host having a
> separate filesystem for the device it owns by default.  Each
> host nfs mounts the other host's data and if host1 fails, host2
> can direct mount host1-fs from the shared array.
>

Separate namespaces are uninteresting to me.  That's just pushing back 
the problem to the user.

> Even with multiple disks under the same filesystem as separate
> allocated storage there is still the problem of shared namespace
> metadata that slows down both hosts.  If you don't need shared
> namespaces then you absolutely don't want a cluster fs.
>

If you separate the allocation metadata to the storage owning node, and 
the file metadata to the actively using node, the slowdown should be low 
in most cases.  Problems begin when all nodes access the same file, but 
that's relatively rare.  Even then, when the file size does not change 
and when the data is preallocated, it's possible to achieve acceptable 
overhead.

> A cluster fs is useful, but the cost can be high so using
> it for a single-host fs is not a good idea.

Development costs, yes.  But I don't see why the runtime overhead can't 
disappear when running on a single host.  Sort of like running an smp 
kernel on uniprocessor (I agree the fs problem is much bigger).

-- 
error compiling committee.c: too many arguments to function


  reply	other threads:[~2008-10-22 14:25 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-21 17:37 Some very basic questions calin
2008-10-21 20:08 ` jim owens
2008-10-22  7:15   ` Avi Kivity
2008-10-22 14:13     ` jim owens
2008-10-22 14:25       ` Avi Kivity [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-10-22 14:35 dbz
2008-10-27 15:43 ` Stephan von Krawczynski
2008-10-21 11:23 Stephan von Krawczynski
2008-10-21 12:13 ` Andi Kleen
2008-10-21 14:22   ` Stephan von Krawczynski
2008-10-21 15:34     ` jim owens
2008-10-22 11:36       ` Stephan von Krawczynski
2008-10-22 12:15         ` Avi Kivity
2008-10-22 13:03           ` Ric Wheeler
2008-10-22 13:13             ` Chris Mason
2008-10-22 13:16             ` Avi Kivity
2008-10-21 13:20 ` jim owens
2008-10-21 17:01   ` Stephan von Krawczynski
2008-10-21 17:15     ` Christoph Hellwig
2008-10-21 17:31       ` Ric Wheeler
2008-10-22 12:27         ` Stephan von Krawczynski
2008-10-22 13:15           ` Chris Mason
2008-10-22 13:27             ` Ric Wheeler
2008-10-22 14:32               ` Avi Kivity
2008-10-22 14:36                 ` Chris Mason
2008-10-22 14:40                   ` Avi Kivity
2008-10-22 14:46                 ` Ric Wheeler
2008-10-22 14:54                   ` Avi Kivity
2008-10-22 15:02                     ` Ric Wheeler
2008-10-22 15:13                       ` Avi Kivity
2008-10-22 15:25                         ` Ric Wheeler
2008-10-22 15:33                           ` Chris Mason
2008-10-22 15:43                             ` Avi Kivity
2008-10-22 15:54                               ` Ric Wheeler
2008-10-22 18:28                                 ` Avi Kivity
2008-10-22 15:39                           ` Avi Kivity
2008-10-22 13:52             ` Stephan von Krawczynski
2008-10-22 15:56               ` Michel Salim
2008-10-22 16:56                 ` jim owens
2008-10-23  9:47                 ` Stephan von Krawczynski
2008-10-22 11:40       ` Stephan von Krawczynski
2008-10-21 13:59 ` Chris Mason
2008-10-21 16:09   ` Andi Kleen
2008-10-22 11:43     ` Stephan von Krawczynski
2008-10-21 16:27   ` Stephan von Krawczynski
2008-10-21 16:59     ` Andi Kleen
2008-10-22 11:46       ` Stephan von Krawczynski
2008-10-21 17:49     ` Chris Mason
2008-10-22 12:19       ` Stephan von Krawczynski
2008-10-22 12:48         ` Jeff Schroeder
2008-10-22 14:02           ` Stephan von Krawczynski
2008-10-22 13:50         ` Chris Mason
2008-10-22 14:04           ` Matthias Wächter
2008-10-22 14:32             ` Ric Wheeler
2008-10-22 14:44               ` jim owens
2008-10-24  8:42           ` Chris Samuel
2008-10-24  8:39         ` Chris Samuel
2008-10-21 20:54   ` Eric Anopolsky
2008-10-21 22:18     ` Ric Wheeler
2008-10-22  2:29       ` Eric Anopolsky
2008-10-22 10:42         ` Ric Wheeler
2008-10-22 10:53           ` Tejun Heo
2008-10-22 12:57             ` Ric Wheeler
2008-10-22 12:57             ` Ric Wheeler
2008-10-22 13:15               ` Tejun Heo
2008-10-22 13:19                 ` Chris Mason
2008-10-22 13:38                   ` Ric Wheeler
2008-10-22 13:59                     ` Chris Mason
2008-10-22 14:23                       ` Ric Wheeler
2008-10-22 13:23                 ` Ric Wheeler
2008-10-22 16:14                   ` Tejun Heo
2008-10-22 16:34                     ` Ric Wheeler
2008-10-23  3:59                       ` Tejun Heo
2008-10-22 18:32                     ` Avi Kivity
2008-10-22 19:13                       ` jim owens
2008-10-22 19:22                         ` Avi Kivity
2008-10-22 19:59                       ` Ric Wheeler
2008-10-22 21:31                     ` Eric Anopolsky
2008-10-22 21:56                       ` Ric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48FF37C1.1010001@redhat.com \
    --to=avi@redhat.com \
    --cc=jowens@hp.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=skraw@ithnet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.