* Ceph on just two nodes being clients - reasonable? @ 2011-01-19 10:33 Tomasz Chmielewski 2011-01-19 11:30 ` DongJin Lee 2011-01-19 11:41 ` Wido den Hollander 0 siblings, 2 replies; 11+ messages in thread From: Tomasz Chmielewski @ 2011-01-19 10:33 UTC (permalink / raw) To: ceph-devel Is it reasonable to set up Ceph on two nodes, which are Ceph clients at the same time? Say, we have two machines: ceph1 -- ceph2 On each of them, Ceph filesystem is mounted in /shared, which is used by services like a webserver or a mailserver. Is it a reasonable approach? -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable? 2011-01-19 10:33 Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski @ 2011-01-19 11:30 ` DongJin Lee 2011-01-19 11:41 ` Wido den Hollander 1 sibling, 0 replies; 11+ messages in thread From: DongJin Lee @ 2011-01-19 11:30 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: ceph-devel On Wed, Jan 19, 2011 at 11:33 PM, Tomasz Chmielewski <mangoo@wpkg.org> wrote: > Is it reasonable to set up Ceph on two nodes, which are Ceph clients at the > same time? > > > Say, we have two machines: > > ceph1 -- ceph2 > > > On each of them, Ceph filesystem is mounted in /shared, which is used by > services like a webserver or a mailserver. > > Is it a reasonable approach? Somehow I could not really run a reliable benchmarking (freezing) when the ceph-client was sitting on the same machine. It might've fixed now, but maybe you also need good machines too! -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable? 2011-01-19 10:33 Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski 2011-01-19 11:30 ` DongJin Lee @ 2011-01-19 11:41 ` Wido den Hollander 2011-01-19 11:55 ` would you recommend me a solution to store xen-imgfile Longguang Yue 2011-01-19 12:14 ` Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski 1 sibling, 2 replies; 11+ messages in thread From: Wido den Hollander @ 2011-01-19 11:41 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: ceph-devel Hi Thomas, I think the answer is Yes and No on this question, the devs might have another approach for your situation. If you would do this, you would have a MON, MDS and OSD on every server, in theory that would work. Mounting would be done by connecting to on of the MON's (doesn't matter which one). But Ceph requires, well, advises a odd number of monitors (Source: http://ceph.newdream.net/wiki/Designing_a_cluster ) So you would require a third node which is running your third monitor to keep track of both nodes. My advise, for two nodes, use something like DRBD in Primary <> Primary and use a cluster filesystem like OCFS2. Wido On Wed, 2011-01-19 at 11:33 +0100, Tomasz Chmielewski wrote: > Is it reasonable to set up Ceph on two nodes, which are Ceph clients at > the same time? > > > Say, we have two machines: > > ceph1 -- ceph2 > > > On each of them, Ceph filesystem is mounted in /shared, which is used by > services like a webserver or a mailserver. > > Is it a reasonable approach? > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* would you recommend me a solution to store xen-imgfile 2011-01-19 11:41 ` Wido den Hollander @ 2011-01-19 11:55 ` Longguang Yue 2011-01-19 17:06 ` Sage Weil 2011-01-19 12:14 ` Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski 1 sibling, 1 reply; 11+ messages in thread From: Longguang Yue @ 2011-01-19 11:55 UTC (permalink / raw) To: ceph-devel, ceph-devel-owner; +Cc: Wido den Hollander, Tomasz Chmielewski would you recommend me a solution to store xen-imgfile 1. stability 2. throughout 3. redundancy If ceph suitable for storing xen-imgfile?? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: would you recommend me a solution to store xen-imgfile 2011-01-19 11:55 ` would you recommend me a solution to store xen-imgfile Longguang Yue @ 2011-01-19 17:06 ` Sage Weil 0 siblings, 0 replies; 11+ messages in thread From: Sage Weil @ 2011-01-19 17:06 UTC (permalink / raw) To: Longguang Yue; +Cc: ceph-devel, Wido den Hollander, Tomasz Chmielewski Hi Longguang, On Wed, 19 Jan 2011, Longguang Yue wrote: > would you recommend me a solution to store xen-imgfile > 1. stability > 2. throughout > 3. redundancy > If ceph suitable for storing xen-imgfile?? I would suggest using the kernel RBD driver, now present in 2.6.37. See http://ceph.newdream.net/wiki/Rbd sage ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable? 2011-01-19 11:41 ` Wido den Hollander 2011-01-19 11:55 ` would you recommend me a solution to store xen-imgfile Longguang Yue @ 2011-01-19 12:14 ` Tomasz Chmielewski 2011-01-19 15:57 ` Gregory Farnum 1 sibling, 1 reply; 11+ messages in thread From: Tomasz Chmielewski @ 2011-01-19 12:14 UTC (permalink / raw) To: Wido den Hollander; +Cc: ceph-devel On 19.01.2011 12:41, Wido den Hollander wrote: > Hi Thomas, > > I think the answer is Yes and No on this question, the devs might have > another approach for your situation. > > If you would do this, you would have a MON, MDS and OSD on every server, > in theory that would work. Mounting would be done by connecting to on of > the MON's (doesn't matter which one). > > But Ceph requires, well, advises a odd number of monitors (Source: > http://ceph.newdream.net/wiki/Designing_a_cluster ) > > So you would require a third node which is running your third monitor to > keep track of both nodes. > > My advise, for two nodes, use something like DRBD in Primary<> Primary > and use a cluster filesystem like OCFS2. Currently, I'm running glusterfs in such a scenario (two servers, each being also clients), but I wanted to give ceph a try (glusterfs has some performance issues with lots of small files), also because of its nice features (snapshots, rbd etc.). -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable? 2011-01-19 12:14 ` Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski @ 2011-01-19 15:57 ` Gregory Farnum 2011-01-19 16:21 ` Tomasz Chmielewski 2011-01-19 17:55 ` Colin McCabe 0 siblings, 2 replies; 11+ messages in thread From: Gregory Farnum @ 2011-01-19 15:57 UTC (permalink / raw) To: Tomasz Chmielewski; +Cc: Wido den Hollander, ceph-devel On Wed, Jan 19, 2011 at 4:14 AM, Tomasz Chmielewski <mangoo@wpkg.org> wrote: > On 19.01.2011 12:41, Wido den Hollander wrote: >> >> Hi Thomas, >> >> I think the answer is Yes and No on this question, the devs might have >> another approach for your situation. >> >> If you would do this, you would have a MON, MDS and OSD on every server, >> in theory that would work. Mounting would be done by connecting to on of >> the MON's (doesn't matter which one). >> >> But Ceph requires, well, advises a odd number of monitors (Source: >> http://ceph.newdream.net/wiki/Designing_a_cluster ) >> >> So you would require a third node which is running your third monitor to >> keep track of both nodes. >> >> My advise, for two nodes, use something like DRBD in Primary<> Primary >> and use a cluster filesystem like OCFS2. > > Currently, I'm running glusterfs in such a scenario (two servers, each being > also clients), but I wanted to give ceph a try (glusterfs has some > performance issues with lots of small files), also because of its nice > features (snapshots, rbd etc.). Rather than running 3 monitors you could just put a monitor on one of the machines -- your cluster will go down if it fails, but in a 2-node system it's not like resilience from one-node failure would be very helpful anyway. However, there is a serious issue with running clients and servers on one machine, which may or may not be a problem depending on your use case: Deadlock becomes a significant possibility. This isn't a problem we've come up with a good solution for, unfortunately, but imagine you're writing a lot of files to Ceph. Ceph dutifully writes them and the kernel dutifully caches them. You also have a lot of write activity so the Ceph kernel client is doing local caching. Then the kernel comes along and says "I'm low on memory! Flush stuff to disk!" and the kernel client tries to flush it out...which involves creating another copy of the data in memory on the same machine. Uh-oh! Now if you use the FUSE client this won't be an issue, but your performance also won't be so good. :/ -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable? 2011-01-19 15:57 ` Gregory Farnum @ 2011-01-19 16:21 ` Tomasz Chmielewski 2011-01-19 17:55 ` Colin McCabe 1 sibling, 0 replies; 11+ messages in thread From: Tomasz Chmielewski @ 2011-01-19 16:21 UTC (permalink / raw) To: Gregory Farnum; +Cc: Wido den Hollander, ceph-devel On 19.01.2011 16:57, Gregory Farnum wrote: >> Currently, I'm running glusterfs in such a scenario (two servers, each being >> also clients), but I wanted to give ceph a try (glusterfs has some >> performance issues with lots of small files), also because of its nice >> features (snapshots, rbd etc.). > Rather than running 3 monitors you could just put a monitor on one of > the machines -- your cluster will go down if it fails, but in a 2-node > system it's not like resilience from one-node failure would be very > helpful anyway. OK, I could imagine starting the monitor on just one node i.e. with the help of heartbeat - so if the node with the monitor goes down, heartbeat starts the monitor process on the other machine. > However, there is a serious issue with running clients and servers on > one machine, which may or may not be a problem depending on your use > case: Deadlock becomes a significant possibility. Sounds like the "freezes" issue mentioned by Dong Jin Lee? > This isn't a problem > we've come up with a good solution for, unfortunately, but imagine > you're writing a lot of files to Ceph. Ceph dutifully writes them and > the kernel dutifully caches them. You also have a lot of write > activity so the Ceph kernel client is doing local caching. Then the > kernel comes along and says "I'm low on memory! Flush stuff to disk!" > and the kernel client tries to flush it out...which involves creating > another copy of the data in memory on the same machine. Uh-oh! Uh-oh, it doesn't sound encouraging, and will likely happen sooner or later. Would some sort of zero-copy help here? But perhaps it's not that easy to solve, otherwise, we wouldn't be discussing it here. I think swapping over NFS (or, iSCSI) has a similar problem ("need to write, but the network buffer is full, so we can't write over network -> deadlock"), and there were some patches floating around some years ago to solve it. Not sure what's the state of it and how similar it is to Ceph though. Last I checked, 2 < 3, so having a budget HA solution which just needed 2 servers instead of 3 would be a great thing to have! ;) -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable? 2011-01-19 15:57 ` Gregory Farnum 2011-01-19 16:21 ` Tomasz Chmielewski @ 2011-01-19 17:55 ` Colin McCabe 2011-01-19 20:03 ` Tommi Virtanen 1 sibling, 1 reply; 11+ messages in thread From: Colin McCabe @ 2011-01-19 17:55 UTC (permalink / raw) To: Gregory Farnum; +Cc: Tomasz Chmielewski, Wido den Hollander, ceph-devel On Wed, Jan 19, 2011 at 7:57 AM, Gregory Farnum <gregf@hq.newdream.net> wrote: > However, there is a serious issue with running clients and servers on > one machine, which may or may not be a problem depending on your use > case: Deadlock becomes a significant possibility. This isn't a problem > we've come up with a good solution for, unfortunately, but imagine > you're writing a lot of files to Ceph. Ceph dutifully writes them and > the kernel dutifully caches them. You also have a lot of write > activity so the Ceph kernel client is doing local caching. Then the > kernel comes along and says "I'm low on memory! Flush stuff to disk!" > and the kernel client tries to flush it out...which involves creating > another copy of the data in memory on the same machine. Uh-oh! > Now if you use the FUSE client this won't be an issue, but your > performance also won't be so good. :/ If you knew what the maximum memory consumption for the daemons would be, you could use mlock to lock all those pages into memory (make them unswappable.) Then you could use rlimit to ensure that if the daemon ever tried to allocate more than that, it would be killed. That would prevent the scenario you outlined above where there is not enough memory to flush the page cache. Of course, to do this, we would need to reduce memory consumption and make it deterministic for this to be feasible. cheers, Colin ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable? 2011-01-19 17:55 ` Colin McCabe @ 2011-01-19 20:03 ` Tommi Virtanen 2011-01-19 21:09 ` Colin McCabe 0 siblings, 1 reply; 11+ messages in thread From: Tommi Virtanen @ 2011-01-19 20:03 UTC (permalink / raw) To: Colin McCabe Cc: Gregory Farnum, Tomasz Chmielewski, Wido den Hollander, ceph-devel On Wed, Jan 19, 2011 at 09:55:27AM -0800, Colin McCabe wrote: > If you knew what the maximum memory consumption for the daemons would > be, you could use mlock to lock all those pages into memory (make them > unswappable.) Then you could use rlimit to ensure that if the daemon > ever tried to allocate more than that, it would be killed. The classic nfs loopback mount deadlock is less about how much memory the daemons are grabbing via malloc etc, and more about the buffer cache management in kernel. With a "loopback ceph", pressure from activity on the kernel ceph client mountpoint might interact badly with the buffer cache the OSD needs to work well, whether the OSD userspace tries to limit itself or not. It's one of those "it'll work until you have a bad day" things. http://www.webservertalk.com/archive242-2007-10-2051163.html https://bugzilla.redhat.com/show_bug.cgi?id=489889 http://lkml.org/lkml/2006/12/14/448 http://docs.google.com/viewer?a=v&q=cache:ONtIKJFSC7QJ:https://tao.truststc.org/Members/hweather/advanced_storage/Public%2520resources/network/nfs_user+nfs+loopback+deadlock+linux&hl=en&gl=us&pid=bl&srcid=ADGEESgpaVYYNoh2pmvPVQ9I_bpLLcoF3GJIMKavomIHNgTb-cbii6RVtWg28poJKdHBqQgKGXzVA2NOsC25FtWMP3yywTfNkX9N26IrKVIcVA9eRz6ZGBx1_Ur0JerUrfBQlPcmcBBz&sig=AHIEtbSjGX_hCVny345iFSq7WKBvxNZmIw (slide 5) -- :(){ :|:&};: ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ceph on just two nodes being clients - reasonable? 2011-01-19 20:03 ` Tommi Virtanen @ 2011-01-19 21:09 ` Colin McCabe 0 siblings, 0 replies; 11+ messages in thread From: Colin McCabe @ 2011-01-19 21:09 UTC (permalink / raw) To: Tommi Virtanen Cc: Gregory Farnum, Tomasz Chmielewski, Wido den Hollander, ceph-devel On Wed, Jan 19, 2011 at 12:03 PM, Tommi Virtanen <tommi.virtanen@dreamhost.com> wrote: > On Wed, Jan 19, 2011 at 09:55:27AM -0800, Colin McCabe wrote: >> If you knew what the maximum memory consumption for the daemons would >> be, you could use mlock to lock all those pages into memory (make them >> unswappable.) Then you could use rlimit to ensure that if the daemon >> ever tried to allocate more than that, it would be killed. > > The classic nfs loopback mount deadlock is less about how much memory > the daemons are grabbing via malloc etc, and more about the buffer > cache management in kernel. My understanding is that nfsd tries to allocate memory, which turns out to be impossible because the page cache is occupying that memory, and requires nfsd to drain. I guess the question you are asking is whether nfsd just doing I/O requires kernel memory that might not be available. I'm not entirely sure about the answer to that. Unfortunately, none of those links has any information on the subject (I had high hopes for the lkml one, but it was about an unrelated race in NFS). Colin ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-01-19 21:10 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-01-19 10:33 Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski 2011-01-19 11:30 ` DongJin Lee 2011-01-19 11:41 ` Wido den Hollander 2011-01-19 11:55 ` would you recommend me a solution to store xen-imgfile Longguang Yue 2011-01-19 17:06 ` Sage Weil 2011-01-19 12:14 ` Ceph on just two nodes being clients - reasonable? Tomasz Chmielewski 2011-01-19 15:57 ` Gregory Farnum 2011-01-19 16:21 ` Tomasz Chmielewski 2011-01-19 17:55 ` Colin McCabe 2011-01-19 20:03 ` Tommi Virtanen 2011-01-19 21:09 ` Colin McCabe
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.