* Performance test on Ceph cluster @ 2012-02-22 9:39 madhusudhana 2012-02-22 17:55 ` Gregory Farnum 2012-02-22 18:42 ` Tommi Virtanen 0 siblings, 2 replies; 10+ messages in thread From: madhusudhana @ 2012-02-22 9:39 UTC (permalink / raw) To: ceph-devel Hi I have finally configured a ceph cluster with 8 nodes. I have 2 MDS servers and 3 monitors and rest of 3 nodes are OSD. Each system has 2T SATA drives. I have 3 partitions created, one for root file system, another for CEPH journal and the rest of the space is for OSD. I was able to get 5.6T space from three nodes. All the machines are of same type (HP DL160 G7) with 48 of RAM and quad core dual cpu's. I am using iozone for testing the performance against NetApp filer Below is the command what I am using for iozone test /opt/iozone/bin/iozone -R -e -l i -u 1 -r 4096k -s 1024m -F /mnt/ceph- test/ceph.iozone When I see the result, the value from ceph cluster is not at all coming near to an entry level NetApp filer. I would like to know the ideal way (if there is any) for testing ceph cluster performance so that we can compare it with other file systems. If required, I can share the test results. Any suggestion greatly appreciated. Thanks __madhusudhan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance test on Ceph cluster 2012-02-22 9:39 Performance test on Ceph cluster madhusudhana @ 2012-02-22 17:55 ` Gregory Farnum 2012-02-22 18:42 ` Tommi Virtanen 1 sibling, 0 replies; 10+ messages in thread From: Gregory Farnum @ 2012-02-22 17:55 UTC (permalink / raw) To: madhusudhana; +Cc: ceph-devel On Wed, Feb 22, 2012 at 1:39 AM, madhusudhana <madhusudhana.u.acharya@gmail.com> wrote: > Hi > I have finally configured a ceph cluster with 8 nodes. I have 2 MDS > servers and 3 monitors and rest of 3 nodes are OSD. Each system has > 2T SATA drives. I have 3 partitions created, one for root file > system, another for CEPH journal and the rest of the space is for > OSD. I was able to get 5.6T space from three nodes. > > All the machines are of same type (HP DL160 G7) with 48 of RAM and > quad core dual cpu's. > > I am using iozone for testing the performance against NetApp filer > Below is the command what I am using for iozone test > > /opt/iozone/bin/iozone -R -e -l i -u 1 -r 4096k -s 1024m -F /mnt/ceph- > test/ceph.iozone If I'm reading it correctly you're using Direct IO? That's almost certainly just going to be slow... > > When I see the result, the value from ceph cluster is not at all > coming near to an entry level NetApp filer. I would like to > know the ideal way (if there is any) for testing ceph cluster > performance so that we can compare it with other file systems. > If required, I can share the test results. Please do, or we won't have anywhere to go off of. :) And are you running this on GigE or 10GigE? -Greg ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance test on Ceph cluster 2012-02-22 9:39 Performance test on Ceph cluster madhusudhana 2012-02-22 17:55 ` Gregory Farnum @ 2012-02-22 18:42 ` Tommi Virtanen 2012-02-23 7:12 ` madhusudhana 1 sibling, 1 reply; 10+ messages in thread From: Tommi Virtanen @ 2012-02-22 18:42 UTC (permalink / raw) To: madhusudhana; +Cc: ceph-devel On Wed, Feb 22, 2012 at 01:39, madhusudhana <madhusudhana.u.acharya@gmail.com> wrote: > Hi > I have finally configured a ceph cluster with 8 nodes. I have 2 MDS > servers and 3 monitors and rest of 3 nodes are OSD. Each system has > 2T SATA drives. I have 3 partitions created, one for root file > system, another for CEPH journal and the rest of the space is for > OSD. I was able to get 5.6T space from three nodes. > > All the machines are of same type (HP DL160 G7) with 48 of RAM and > quad core dual cpu's. > > I am using iozone for testing the performance against NetApp filer > Below is the command what I am using for iozone test > > /opt/iozone/bin/iozone -R -e -l i -u 1 -r 4096k -s 1024m -F /mnt/ceph- > test/ceph.iozone 1. Make sure you have only 1 active MDS, multi-MDS is an extra complication you're better skipping right now. 2. What underlying filesystem are you using for the OSDs? 3. What Linux kernel version is running on the OSDs? 4. What machine is the client, running iozone? How is it connected to the others? Kernel client or FUSE? 5. What Linux kernel version is running on the client? Going forward, in a setup like that you could put OSDs on the machines that run ceph-mon; ceph-mon is very lightweight and doesn't need dedicated machines in a small setup. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance test on Ceph cluster 2012-02-22 18:42 ` Tommi Virtanen @ 2012-02-23 7:12 ` madhusudhana 2012-02-23 18:05 ` Tommi Virtanen 0 siblings, 1 reply; 10+ messages in thread From: madhusudhana @ 2012-02-23 7:12 UTC (permalink / raw) To: ceph-devel Tommi Virtanen <tommi.virtanen <at> dreamhost.com> writes: > > On Wed, Feb 22, 2012 at 01:39, madhusudhana > <madhusudhana.u.acharya <at> gmail.com> wrote: > > Hi > > I have finally configured a ceph cluster with 8 nodes. I have 2 MDS > > servers and 3 monitors and rest of 3 nodes are OSD. Each system has > > 2T SATA drives. I have 3 partitions created, one for root file > > system, another for CEPH journal and the rest of the space is for > > OSD. I was able to get 5.6T space from three nodes. > > > > All the machines are of same type (HP DL160 G7) with 48 of RAM and > > quad core dual cpu's. > > > > I am using iozone for testing the performance against NetApp filer > > Below is the command what I am using for iozone test > > > > /opt/iozone/bin/iozone -R -e -l i -u 1 -r 4096k -s 1024m -F /mnt/ceph- > > test/ceph.iozone > > 1. Make sure you have only 1 active MDS, multi-MDS is an extra > complication you're better skipping right now. > 2. What underlying filesystem are you using for the OSDs? > 3. What Linux kernel version is running on the OSDs? > 4. What machine is the client, running iozone? How is it connected to > the others? Kernel client or FUSE? > 5. What Linux kernel version is running on the client? > > Going forward, in a setup like that you could put OSDs on the machines > that run ceph-mon; ceph-mon is very lightweight and doesn't need > dedicated machines in a small setup. > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo <at> vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > 1. can you please let me know how I can make only 1 MDS active ? 2. BTRFS for all OSD's 3. All hosts (including OSD) in my ceph cluster are running 3.0.9 ver [root@ceph-node-8 ~]# uname -r 3.0.9 4. All 9 machines are replica of each other. I have imaged them using systemimager. Only difference is 9th node is not a part of CEPH cluster. I mounted ceph cluster to this node using mount -t ceph command 5. All 9 clients are running same version of CentOS and Kernel with 1GigE interface You mean to say, I can have ceph mon/OSD's running in the same machine ? but, in ceph wiki, i have read that its better to have different machines for each mds/mon/osd. I assume that ceph uses whatever ethernet interface i have (1GigE) in my system to load balance the cluster in case of node failure and node addition. Won't this uses entire bandwidth during load balancing ? won't this cause bandwidth saturation for clients ? I would like to know what benchmark I should use to test CEPH ? I want to present the data to my management how CEPH can perform when compared with other file systems (like GlusterFS/NetApp/Lustre) Thanks Madhusudhana ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance test on Ceph cluster 2012-02-23 7:12 ` madhusudhana @ 2012-02-23 18:05 ` Tommi Virtanen 2012-02-24 8:58 ` madhusudhana 0 siblings, 1 reply; 10+ messages in thread From: Tommi Virtanen @ 2012-02-23 18:05 UTC (permalink / raw) To: madhusudhana; +Cc: ceph-devel On Wed, Feb 22, 2012 at 23:12, madhusudhana <madhusudhana.u.acharya@gmail.com> wrote: > 1. can you please let me know how I can make only 1 MDS active ? You can see that in "ceph -s" output, the "mds" line should have just one entry like "0=a=up:active" with the word active. You can control that with the "max mds" config option, and at runtime with "ceph mds set_max_mds NUM" and "ceph mds stop ID". Note, decreasing the number of active MDSes is not currently well tested. You might be better off with a fresh cluster, that only ever ran one ceph-mds process. > 2. BTRFS for all OSD's There is currently one known case where btrfs's internal structured get fragmented, and its performance starts degrading. You might want to make sure you start your test with freshly-mkfs'ed btrfses. > 3. All hosts (including OSD) in my ceph cluster are running 3.0.9 ver > [root@ceph-node-8 ~]# uname -r > 3.0.9 Well, that's at least in the 3.x series.. Btrfs has had a steady stream of fixes, so in general we recommend running the latest stable kernel. You might want to try that. > 4. All 9 machines are replica of each other. I have imaged them using > systemimager. Only difference is 9th node is not a part of CEPH > cluster. I mounted ceph cluster to this node using mount -t ceph > command That's good. > 5. All 9 clients are running same version of CentOS and Kernel with > 1GigE interface > You mean to say, I can have ceph mon/OSD's running in the > same machine ? but, in ceph wiki, i have read that its better to > have different machines for each mds/mon/osd. Yes, I just wanted to make sure you have it set up like that. > I assume that ceph uses whatever ethernet interface i have (1GigE) > in my system to load balance the cluster in case of node failure and > node addition. Won't this uses entire bandwidth during load > balancing ? won't this cause bandwidth saturation for clients ? Yes. That's why you can set up a separate network for cluster-internal communication. See "cluster network" or "cluster addr" vs "public network" or "public addr". > I would like to know what benchmark I should use to test CEPH ? > I want to present the data to my management how CEPH can perform when > compared with other file systems (like GlusterFS/NetApp/Lustre) You should use the benchmark that matches your actual workload best. Please stay active on the mailing list until your results start looking good. The more information you can provide, the better we can help you. We're looking forward to get one of our new hires going, he'll be benchmarking Ceph on pretty decent hardware & 10gig network with whatever loads we can come up with. That should give you a better idea of what to expect, and us what to keep working on. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance test on Ceph cluster 2012-02-23 18:05 ` Tommi Virtanen @ 2012-02-24 8:58 ` madhusudhana 2012-02-24 18:51 ` Tommi Virtanen 0 siblings, 1 reply; 10+ messages in thread From: madhusudhana @ 2012-02-24 8:58 UTC (permalink / raw) To: ceph-devel Tommi Virtanen <tommi.virtanen <at> dreamhost.com> writes: > > On Wed, Feb 22, 2012 at 23:12, madhusudhana > <madhusudhana.u.acharya <at> gmail.com> wrote: > > 1. can you please let me know how I can make only 1 MDS active ? > > You can see that in "ceph -s" output, the "mds" line should have just > one entry like "0=a=up:active" with the word active. > > You can control that with the "max mds" config option, and at runtime > with "ceph mds set_max_mds NUM" and "ceph mds stop ID". > > Note, decreasing the number of active MDSes is not currently well > tested. You might be better off with a fresh cluster, that only ever > ran one ceph-mds process. > > > 2. BTRFS for all OSD's > > There is currently one known case where btrfs's internal structured > get fragmented, and its performance starts degrading. You might want > to make sure you start your test with freshly-mkfs'ed btrfses. > > > 3. All hosts (including OSD) in my ceph cluster are running 3.0.9 ver > > [root <at> ceph-node-8 ~]# uname -r > > 3.0.9 > > Well, that's at least in the 3.x series.. Btrfs has had a steady > stream of fixes, so in general we recommend running the latest stable > kernel. You might want to try that. > > > 4. All 9 machines are replica of each other. I have imaged them using > > systemimager. Only difference is 9th node is not a part of CEPH > > cluster. I mounted ceph cluster to this node using mount -t ceph > > command > > That's good. > > > 5. All 9 clients are running same version of CentOS and Kernel with > > 1GigE interface > > > You mean to say, I can have ceph mon/OSD's running in the > > same machine ? but, in ceph wiki, i have read that its better to > > have different machines for each mds/mon/osd. > > Yes, I just wanted to make sure you have it set up like that. > > > I assume that ceph uses whatever ethernet interface i have (1GigE) > > in my system to load balance the cluster in case of node failure and > > node addition. Won't this uses entire bandwidth during load > > balancing ? won't this cause bandwidth saturation for clients ? > > Yes. That's why you can set up a separate network for cluster-internal > communication. See "cluster network" or "cluster addr" vs "public > network" or "public addr". > > > I would like to know what benchmark I should use to test CEPH ? > > I want to present the data to my management how CEPH can perform when > > compared with other file systems (like GlusterFS/NetApp/Lustre) > > You should use the benchmark that matches your actual workload best. > > Please stay active on the mailing list until your results start > looking good. The more information you can provide, the better we can > help you. > > We're looking forward to get one of our new hires going, he'll be > benchmarking Ceph on pretty decent hardware & 10gig network with > whatever loads we can come up with. That should give you a better idea > of what to expect, and us what to keep working on. > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo <at> vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Thank you Tommi for your response. 1. In my cluster, all OSD's are mkfs'ed with btrfs 2. Below is what i can see with ceph -s output. Is that mean, only one MDS is operation and another one is standby ? mds e5: 1/1/1 up {0=ceph-node-1=up:active}, 1 up:standby 3. I will not be able to use new stable kernel bcz of company policy :-( 4. If you don't mind, can you please give me a bit of insight on cluster network, what it is and how i can configure one for my ceph cluster ? Will there be a significant performance improvement with this ? 5. I have done some testing with dd on ceph. Below are the results CASE 1:[root@ceph-node-9 ~]# dd if=/dev/zero of=/mnt/ceph-test/wtest bs=4k count=1000000 1000000+0 records in 1000000+0 records out 4096000000 bytes (4.1 GB) copied, 4.04089 seconds, 1.0 GB/s CASE 2:[root@ceph-node-9 ~]# dd if=/dev/zero of=/mnt/ceph-test/wtest bs=4k count=10000000 10000000+0 records in 10000000+0 records out 40960000000 bytes (41 GB) copied, 445.786 seconds, 91.9 MB/s CASE 3:[root@ceph-node-9 ~]# dd if=/dev/zero of=/mnt/ceph-test/wtest bs=4k count=100000000 71414032+0 records in 71414032+0 records out 292511875072 bytes (293 GB) copied, 4116.59 seconds, 71.1 MB/s As you can see from above output, for 4G file of 4k blocks, speed clocked at 1GB/s, it gradually decreased when i increased the file size above 10G. And also, if i run back to back dd with CASE 1 option, the write will slow down from 1GB/s to 90MB/s. can you please explain whether this behaviour is expected ? if yes, why ? if not, how i can achieve 1GB/s for all file sizes ? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance test on Ceph cluster 2012-02-24 8:58 ` madhusudhana @ 2012-02-24 18:51 ` Tommi Virtanen 2012-03-07 11:23 ` Guido Winkelmann 0 siblings, 1 reply; 10+ messages in thread From: Tommi Virtanen @ 2012-02-24 18:51 UTC (permalink / raw) To: madhusudhana; +Cc: ceph-devel On Fri, Feb 24, 2012 at 00:58, madhusudhana <madhusudhana.u.acharya@gmail.com> wrote: > 1. In my cluster, all OSD's are mkfs'ed with btrfs > 2. Below is what i can see with ceph -s output. Is that mean, only one MDS > is operation and another one is standby ? > mds e5: 1/1/1 up {0=ceph-node-1=up:active}, 1 up:standby Yes, you have 1 active and 1 standby mds. > 3. I will not be able to use new stable kernel bcz of company policy :-( That might become an issue. > 4. If you don't mind, can you please give me a bit of insight on cluster > network, what it is and how i can configure one for my ceph cluster ? > Will there be a significant performance improvement with this ? When a client submits a write to Ceph, it needs to be replicated (usually to two replicas). If all the ceph servers have two network interfaces, and you have two separate networks you connect the servers to, you can make the replication traffic go over the second interface, and thus you'll have more available bandwidth between the cluster and the clients in the first one. Or you could just bond two 1 gig links, or you could buy 10gig gear. > 5. I have done some testing with dd on ceph. Below are the results > > CASE 1:[root@ceph-node-9 ~]# dd if=/dev/zero of=/mnt/ceph-test/wtest bs=4k > count=1000000 ... > As you can see from above output, for 4G file of 4k blocks, speed clocked at > 1GB/s, it gradually decreased when i increased the file size above 10G. > And also, if i run back to back dd with CASE 1 option, the write will > slow down from 1GB/s to 90MB/s. > > can you please explain whether this behaviour is expected ? if yes, why ? > if not, how i can achieve 1GB/s for all file sizes ? dd is not a very good benchmark. The 4GB write is small enough to typically be (mostly) stored in RAM even on your client machine -- your benchmark might not even be sending the data over the network yet! You can get a little bit better by adding conv=fsync to your dd command lines, that makes sure the data is written to disks before claiming completion. In general, you should look for a benchmark that is closer to your workload. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance test on Ceph cluster 2012-02-24 18:51 ` Tommi Virtanen @ 2012-03-07 11:23 ` Guido Winkelmann 2012-03-07 11:31 ` Wido den Hollander 0 siblings, 1 reply; 10+ messages in thread From: Guido Winkelmann @ 2012-03-07 11:23 UTC (permalink / raw) To: ceph-devel Am Freitag, 24. Februar 2012, 10:51:10 schrieb Tommi Virtanen: > On Fri, Feb 24, 2012 at 00:58, madhusudhana > <madhusudhana.u.acharya@gmail.com> wrote: > > 4. If you don't mind, can you please give me a bit of insight on cluster > > network, what it is and how i can configure one for my ceph cluster ? > > Will there be a significant performance improvement with this ? > > When a client submits a write to Ceph, it needs to be replicated > (usually to two replicas). If all the ceph servers have two network > interfaces, and you have two separate networks you connect the servers > to, you can make the replication traffic go over the second interface, > and thus you'll have more available bandwidth between the cluster and > the clients in the first one. That sounds really interesting, any pointers on how that can be configured? As far as I can see, this is going to me I will have to assign to IP addresses to each node, one for internal traffic, one for external... Regards, Guido ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance test on Ceph cluster 2012-03-07 11:23 ` Guido Winkelmann @ 2012-03-07 11:31 ` Wido den Hollander 2012-03-07 18:09 ` Tommi Virtanen 0 siblings, 1 reply; 10+ messages in thread From: Wido den Hollander @ 2012-03-07 11:31 UTC (permalink / raw) To: Guido Winkelmann; +Cc: ceph-devel Hi, On 03/07/2012 12:23 PM, Guido Winkelmann wrote: > Am Freitag, 24. Februar 2012, 10:51:10 schrieb Tommi Virtanen: >> On Fri, Feb 24, 2012 at 00:58, madhusudhana >> <madhusudhana.u.acharya@gmail.com> wrote: >>> 4. If you don't mind, can you please give me a bit of insight on cluster >>> network, what it is and how i can configure one for my ceph cluster ? >>> Will there be a significant performance improvement with this ? >> >> When a client submits a write to Ceph, it needs to be replicated >> (usually to two replicas). If all the ceph servers have two network >> interfaces, and you have two separate networks you connect the servers >> to, you can make the replication traffic go over the second interface, >> and thus you'll have more available bandwidth between the cluster and >> the clients in the first one. > > That sounds really interesting, any pointers on how that can be configured? You should take a look at: * public_addr * cluster_addr * public_network * cluster_network (From: src/common/config_opts.h) [osd] cluster network = 192.168.0.0/16 public network = 172.16.0.0/16 [osd.1] cluster addr = 192.168.0.1 public addr = 172.16.0.1 I haven't played with those options, but that should do it (I think) Wido > > As far as I can see, this is going to me I will have to assign to IP addresses > to each node, one for internal traffic, one for external... > > Regards, > Guido > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Performance test on Ceph cluster 2012-03-07 11:31 ` Wido den Hollander @ 2012-03-07 18:09 ` Tommi Virtanen 0 siblings, 0 replies; 10+ messages in thread From: Tommi Virtanen @ 2012-03-07 18:09 UTC (permalink / raw) To: Wido den Hollander; +Cc: Guido Winkelmann, ceph-devel On Wed, Mar 7, 2012 at 03:31, Wido den Hollander <wido@widodh.nl> wrote: > You should take a look at: > > * public_addr > * cluster_addr > * public_network > * cluster_network > > (From: src/common/config_opts.h) > > [osd] > cluster network = 192.168.0.0/16 > public network = 172.16.0.0/16 > > [osd.1] > cluster addr = 192.168.0.1 > public addr = 172.16.0.1 > > I haven't played with those options, but that should do it (I think) You don't need both "addr" and "network", just one will do. The "network" variant was added so you don't need any per-node configuration entries (making it easier to add/remove nodes). It'll look for the IP addresses the server has, find one that matches that subnet, and bind to that one. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-03-07 18:09 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-02-22 9:39 Performance test on Ceph cluster madhusudhana 2012-02-22 17:55 ` Gregory Farnum 2012-02-22 18:42 ` Tommi Virtanen 2012-02-23 7:12 ` madhusudhana 2012-02-23 18:05 ` Tommi Virtanen 2012-02-24 8:58 ` madhusudhana 2012-02-24 18:51 ` Tommi Virtanen 2012-03-07 11:23 ` Guido Winkelmann 2012-03-07 11:31 ` Wido den Hollander 2012-03-07 18:09 ` Tommi Virtanen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.