Ceph on just two nodes being clients

All of lore.kernel.org
 help / color / mirror / Atom feed

* Ceph on just two nodes being clients - reasonable?
@ 2011-01-19 10:33 Tomasz Chmielewski
  2011-01-19 11:30 ` DongJin Lee
  2011-01-19 11:41 ` Wido den Hollander
  0 siblings, 2 replies; 11+ messages in thread
From: Tomasz Chmielewski @ 2011-01-19 10:33 UTC (permalink / raw)
  To: ceph-devel

Is it reasonable to set up Ceph on two nodes, which are Ceph clients at 
the same time?

Say, we have two machines:

   ceph1 -- ceph2

On each of them, Ceph filesystem is mounted in /shared, which is used by 
services like a webserver or a mailserver.

Is it a reasonable approach?

-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ceph on just two nodes being clients - reasonable?
  2011-01-19 10:33 Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
@ 2011-01-19 11:30 ` DongJin Lee
  2011-01-19 11:41 ` Wido den Hollander
  1 sibling, 0 replies; 11+ messages in thread
From: DongJin Lee @ 2011-01-19 11:30 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: ceph-devel

On Wed, Jan 19, 2011 at 11:33 PM, Tomasz Chmielewski <mangoo@wpkg.org> wrote:
> Is it reasonable to set up Ceph on two nodes, which are Ceph clients at the
> same time?
>
>
> Say, we have two machines:
>
>  ceph1 -- ceph2
>
>
> On each of them, Ceph filesystem is mounted in /shared, which is used by
> services like a webserver or a mailserver.
>
> Is it a reasonable approach?

Somehow I could not really run a reliable benchmarking (freezing) when
the ceph-client was sitting on the same machine.
It might've fixed now, but maybe you also need good machines too!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ceph on just two nodes being clients - reasonable?
  2011-01-19 10:33 Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
  2011-01-19 11:30 ` DongJin Lee
@ 2011-01-19 11:41 ` Wido den Hollander
  2011-01-19 11:55   ` would you recommend me a solution to store xen-imgfile Longguang Yue
  2011-01-19 12:14   ` Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
  1 sibling, 2 replies; 11+ messages in thread
From: Wido den Hollander @ 2011-01-19 11:41 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: ceph-devel

Hi Thomas,

I think the answer is Yes and No on this question, the devs might have
another approach for your situation.

If you would do this, you would have a MON, MDS and OSD on every server,
in theory that would work. Mounting would be done by connecting to on of
the MON's (doesn't matter which one).

But Ceph requires, well, advises a odd number of monitors (Source:
http://ceph.newdream.net/wiki/Designing_a_cluster )

So you would require a third node which is running your third monitor to
keep track of both nodes.

My advise, for two nodes, use something like DRBD in Primary <> Primary
and use a cluster filesystem like OCFS2.

Wido

On Wed, 2011-01-19 at 11:33 +0100, Tomasz Chmielewski wrote:
> Is it reasonable to set up Ceph on two nodes, which are Ceph clients at 
> the same time?
> 
> 
> Say, we have two machines:
> 
>    ceph1 -- ceph2
> 
> 
> On each of them, Ceph filesystem is mounted in /shared, which is used by 
> services like a webserver or a mailserver.
> 
> Is it a reasonable approach?
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* would you recommend me  a solution to store xen-imgfile
  2011-01-19 11:41 ` Wido den Hollander
@ 2011-01-19 11:55   ` Longguang Yue
  2011-01-19 17:06     ` Sage Weil
  2011-01-19 12:14   ` Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
  1 sibling, 1 reply; 11+ messages in thread
From: Longguang Yue @ 2011-01-19 11:55 UTC (permalink / raw)
  To: ceph-devel, ceph-devel-owner; +Cc: Wido den Hollander, Tomasz Chmielewski

would you recommend me  a solution to store xen-imgfile
1. stability
2. throughout
3. redundancy
If ceph suitable for storing xen-imgfile??

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: would you recommend me  a solution to store xen-imgfile
  2011-01-19 11:55   ` would you recommend me a solution to store xen-imgfile Longguang Yue
@ 2011-01-19 17:06     ` Sage Weil
  0 siblings, 0 replies; 11+ messages in thread
From: Sage Weil @ 2011-01-19 17:06 UTC (permalink / raw)
  To: Longguang Yue; +Cc: ceph-devel, Wido den Hollander, Tomasz Chmielewski

Hi Longguang,

On Wed, 19 Jan 2011, Longguang Yue wrote:
> would you recommend me  a solution to store xen-imgfile
> 1. stability
> 2. throughout
> 3. redundancy
> If ceph suitable for storing xen-imgfile??

I would suggest using the kernel RBD driver, now present in 2.6.37.  See

	http://ceph.newdream.net/wiki/Rbd

sage

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ceph on just two nodes being clients - reasonable?
  2011-01-19 11:41 ` Wido den Hollander
  2011-01-19 11:55   ` would you recommend me a solution to store xen-imgfile Longguang Yue
@ 2011-01-19 12:14   ` Tomasz Chmielewski
  2011-01-19 15:57     ` Gregory Farnum
  1 sibling, 1 reply; 11+ messages in thread
From: Tomasz Chmielewski @ 2011-01-19 12:14 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel

On 19.01.2011 12:41, Wido den Hollander wrote:
> Hi Thomas,
>
> I think the answer is Yes and No on this question, the devs might have
> another approach for your situation.
>
> If you would do this, you would have a MON, MDS and OSD on every server,
> in theory that would work. Mounting would be done by connecting to on of
> the MON's (doesn't matter which one).
>
> But Ceph requires, well, advises a odd number of monitors (Source:
> http://ceph.newdream.net/wiki/Designing_a_cluster )
>
> So you would require a third node which is running your third monitor to
> keep track of both nodes.
>
> My advise, for two nodes, use something like DRBD in Primary<>  Primary
> and use a cluster filesystem like OCFS2.

Currently, I'm running glusterfs in such a scenario (two servers, each 
being also clients), but I wanted to give ceph a try (glusterfs has some 
performance issues with lots of small files), also because of its nice 
features (snapshots, rbd etc.).


-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ceph on just two nodes being clients - reasonable?
  2011-01-19 12:14   ` Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
@ 2011-01-19 15:57     ` Gregory Farnum
  2011-01-19 16:21       ` Tomasz Chmielewski
  2011-01-19 17:55       ` Colin McCabe
  0 siblings, 2 replies; 11+ messages in thread
From: Gregory Farnum @ 2011-01-19 15:57 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Wido den Hollander, ceph-devel

On Wed, Jan 19, 2011 at 4:14 AM, Tomasz Chmielewski <mangoo@wpkg.org> wrote:
> On 19.01.2011 12:41, Wido den Hollander wrote:
>>
>> Hi Thomas,
>>
>> I think the answer is Yes and No on this question, the devs might have
>> another approach for your situation.
>>
>> If you would do this, you would have a MON, MDS and OSD on every server,
>> in theory that would work. Mounting would be done by connecting to on of
>> the MON's (doesn't matter which one).
>>
>> But Ceph requires, well, advises a odd number of monitors (Source:
>> http://ceph.newdream.net/wiki/Designing_a_cluster )
>>
>> So you would require a third node which is running your third monitor to
>> keep track of both nodes.
>>
>> My advise, for two nodes, use something like DRBD in Primary<>  Primary
>> and use a cluster filesystem like OCFS2.
>
> Currently, I'm running glusterfs in such a scenario (two servers, each being
> also clients), but I wanted to give ceph a try (glusterfs has some
> performance issues with lots of small files), also because of its nice
> features (snapshots, rbd etc.).
Rather than running 3 monitors you could just put a monitor on one of
the machines -- your cluster will go down if it fails, but in a 2-node
system it's not like resilience from one-node failure would be very
helpful anyway.

However, there is a serious issue with running clients and servers on
one machine, which may or may not be a problem depending on your use
case: Deadlock becomes a significant possibility. This isn't a problem
we've come up with a good solution for, unfortunately, but imagine
you're writing a lot of files to Ceph. Ceph dutifully writes them and
the kernel dutifully caches them. You also have a lot of write
activity so the Ceph kernel client is doing local caching. Then the
kernel comes along and says "I'm low on memory! Flush stuff to disk!"
and the kernel client tries to flush it out...which involves creating
another copy of the data in memory on the same machine. Uh-oh!
Now if you use the FUSE client this won't be an issue, but your
performance also won't be so good. :/
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ceph on just two nodes being clients - reasonable?
  2011-01-19 15:57     ` Gregory Farnum
@ 2011-01-19 16:21       ` Tomasz Chmielewski
  2011-01-19 17:55       ` Colin McCabe
  1 sibling, 0 replies; 11+ messages in thread
From: Tomasz Chmielewski @ 2011-01-19 16:21 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Wido den Hollander, ceph-devel

On 19.01.2011 16:57, Gregory Farnum wrote:

>> Currently, I'm running glusterfs in such a scenario (two servers, each being
>> also clients), but I wanted to give ceph a try (glusterfs has some
>> performance issues with lots of small files), also because of its nice
>> features (snapshots, rbd etc.).
> Rather than running 3 monitors you could just put a monitor on one of
> the machines -- your cluster will go down if it fails, but in a 2-node
> system it's not like resilience from one-node failure would be very
> helpful anyway.

OK, I could imagine starting the monitor on just one node i.e. with the 
help of heartbeat - so if the node with the monitor goes down, heartbeat 
starts the monitor process on the other machine.

> However, there is a serious issue with running clients and servers on
> one machine, which may or may not be a problem depending on your use
> case: Deadlock becomes a significant possibility.

Sounds like the "freezes" issue mentioned by Dong Jin Lee?

> This isn't a problem
> we've come up with a good solution for, unfortunately, but imagine
> you're writing a lot of files to Ceph. Ceph dutifully writes them and
> the kernel dutifully caches them. You also have a lot of write
> activity so the Ceph kernel client is doing local caching. Then the
> kernel comes along and says "I'm low on memory! Flush stuff to disk!"
> and the kernel client tries to flush it out...which involves creating
> another copy of the data in memory on the same machine. Uh-oh!

Uh-oh, it doesn't sound encouraging, and will likely happen sooner or later.

Would some sort of zero-copy help here? But perhaps it's not that easy 
to solve, otherwise, we wouldn't be discussing it here.

I think swapping over NFS (or, iSCSI) has a similar problem ("need to 
write, but the network buffer is full, so we can't write over network -> 
deadlock"), and there were some patches floating around some years ago 
to solve it. Not sure what's the state of it and how similar it is to 
Ceph though.

Last I checked, 2 < 3, so having a budget HA solution which just needed 
2 servers instead of 3 would be a great thing to have! ;)

-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ceph on just two nodes being clients - reasonable?
  2011-01-19 15:57     ` Gregory Farnum
  2011-01-19 16:21       ` Tomasz Chmielewski
@ 2011-01-19 17:55       ` Colin McCabe
  2011-01-19 20:03         ` Tommi Virtanen
  1 sibling, 1 reply; 11+ messages in thread
From: Colin McCabe @ 2011-01-19 17:55 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Tomasz Chmielewski, Wido den Hollander, ceph-devel

On Wed, Jan 19, 2011 at 7:57 AM, Gregory Farnum <gregf@hq.newdream.net> wrote:

> However, there is a serious issue with running clients and servers on
> one machine, which may or may not be a problem depending on your use
> case: Deadlock becomes a significant possibility. This isn't a problem
> we've come up with a good solution for, unfortunately, but imagine
> you're writing a lot of files to Ceph. Ceph dutifully writes them and
> the kernel dutifully caches them. You also have a lot of write
> activity so the Ceph kernel client is doing local caching. Then the
> kernel comes along and says "I'm low on memory! Flush stuff to disk!"
> and the kernel client tries to flush it out...which involves creating
> another copy of the data in memory on the same machine. Uh-oh!
> Now if you use the FUSE client this won't be an issue, but your
> performance also won't be so good. :/

If you knew what the maximum memory consumption for the daemons would
be, you could use mlock to lock all those pages into memory (make them
unswappable.) Then you could use rlimit to ensure that if the daemon
ever tried to allocate more than that, it would be killed.

That would prevent the scenario you outlined above where there is not
enough memory to flush the page cache. Of course, to do this, we would
need to reduce memory consumption and make it deterministic for this
to be feasible.

cheers,
Colin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ceph on just two nodes being clients - reasonable?
  2011-01-19 17:55       ` Colin McCabe
@ 2011-01-19 20:03         ` Tommi Virtanen
  2011-01-19 21:09           ` Colin McCabe
  0 siblings, 1 reply; 11+ messages in thread
From: Tommi Virtanen @ 2011-01-19 20:03 UTC (permalink / raw)
  To: Colin McCabe
  Cc: Gregory Farnum, Tomasz Chmielewski, Wido den Hollander,
	ceph-devel

On Wed, Jan 19, 2011 at 09:55:27AM -0800, Colin McCabe wrote:
> If you knew what the maximum memory consumption for the daemons would
> be, you could use mlock to lock all those pages into memory (make them
> unswappable.) Then you could use rlimit to ensure that if the daemon
> ever tried to allocate more than that, it would be killed.

The classic nfs loopback mount deadlock is less about how much memory
the daemons are grabbing via malloc etc, and more about the buffer
cache management in kernel.

With a "loopback ceph", pressure from activity on the kernel ceph
client mountpoint might interact badly with the buffer cache the OSD
needs to work well, whether the OSD userspace tries to limit itself or
not.

It's one of those "it'll work until you have a bad day" things.

http://www.webservertalk.com/archive242-2007-10-2051163.html

https://bugzilla.redhat.com/show_bug.cgi?id=489889

http://lkml.org/lkml/2006/12/14/448

http://docs.google.com/viewer?a=v&q=cache:ONtIKJFSC7QJ:https://tao.truststc.org/Members/hweather/advanced_storage/Public%2520resources/network/nfs_user+nfs+loopback+deadlock+linux&hl=en&gl=us&pid=bl&srcid=ADGEESgpaVYYNoh2pmvPVQ9I_bpLLcoF3GJIMKavomIHNgTb-cbii6RVtWg28poJKdHBqQgKGXzVA2NOsC25FtWMP3yywTfNkX9N26IrKVIcVA9eRz6ZGBx1_Ur0JerUrfBQlPcmcBBz&sig=AHIEtbSjGX_hCVny345iFSq7WKBvxNZmIw
(slide 5)

-- 
:(){ :|:&};:

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ceph on just two nodes being clients - reasonable?
  2011-01-19 20:03         ` Tommi Virtanen
@ 2011-01-19 21:09           ` Colin McCabe
  0 siblings, 0 replies; 11+ messages in thread
From: Colin McCabe @ 2011-01-19 21:09 UTC (permalink / raw)
  To: Tommi Virtanen
  Cc: Gregory Farnum, Tomasz Chmielewski, Wido den Hollander,
	ceph-devel

On Wed, Jan 19, 2011 at 12:03 PM, Tommi Virtanen
<tommi.virtanen@dreamhost.com> wrote:
> On Wed, Jan 19, 2011 at 09:55:27AM -0800, Colin McCabe wrote:
>> If you knew what the maximum memory consumption for the daemons would
>> be, you could use mlock to lock all those pages into memory (make them
>> unswappable.) Then you could use rlimit to ensure that if the daemon
>> ever tried to allocate more than that, it would be killed.
>
> The classic nfs loopback mount deadlock is less about how much memory
> the daemons are grabbing via malloc etc, and more about the buffer
> cache management in kernel.

My understanding is that nfsd tries to allocate memory, which turns
out to be impossible because the page cache is occupying that memory,
and requires nfsd to drain.

I guess the question you are asking is whether nfsd just doing I/O
requires kernel memory that might not be available. I'm not entirely
sure about the answer to that. Unfortunately, none of those links has
any information on the subject (I had high hopes for the lkml one, but
it was about an unrelated race in NFS).

Colin

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-01-19 21:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-19 10:33 Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
2011-01-19 11:30 ` DongJin Lee
2011-01-19 11:41 ` Wido den Hollander
2011-01-19 11:55   ` would you recommend me a solution to store xen-imgfile Longguang Yue
2011-01-19 17:06     ` Sage Weil
2011-01-19 12:14   ` Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski
2011-01-19 15:57     ` Gregory Farnum
2011-01-19 16:21       ` Tomasz Chmielewski
2011-01-19 17:55       ` Colin McCabe
2011-01-19 20:03         ` Tommi Virtanen
2011-01-19 21:09           ` Colin McCabe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.