* Re: Infiniband 40GB [not found] <a81f3855-1c7d-447b-9bbf-6a891e372909@mailpro> @ 2012-06-07 3:31 ` Alexandre DERUMIER 2012-06-07 11:25 ` Alexandre DERUMIER 0 siblings, 1 reply; 36+ messages in thread From: Alexandre DERUMIER @ 2012-06-07 3:31 UTC (permalink / raw) To: Mark Nelson; +Cc: Amon Ott, ceph-devel, Yann Dupont Hi again, I have done some tests with journals on a real disk, I have same behaviour. iostat show constant write to journal and write to disks at the same time since the beginning of the benchmark. maybe can I try to use differents partitions for each journal ? (currently I have 1 partition with 5 journal files of each osd) -Alexandre ----- Mail original ----- De: "Alexandre DERUMIER" <aderumier@odiso.com> À: "Mark Nelson" <mark.nelson@inktank.com> Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr> Envoyé: Jeudi 7 Juin 2012 05:11:15 Objet: Re: Infiniband 40GB Hi mark, I have attached a blktrace of /dev/sdb1 of node1 (osd.0) and also iostat (showing constant writes) bench used: rados -p pool3 bench 60 write -t 16 kernel use : 3.4 from intank I'll do tests with journal on an xfs partition today ----- Mail original ----- De: "Mark Nelson" <mark.nelson@inktank.com> À: "Alexandre DERUMIER" <aderumier@odiso.com> Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr> Envoyé: Mercredi 6 Juin 2012 18:43:50 Objet: Re: Infiniband 40GB Hi Alexandre, If you can run blktrace during your test on one of the OSD data disks and send me the results I can take a look at them. Also, the rados bench settings and output would be useful too. Thanks, Mark On 6/6/12 11:05 AM, Alexandre DERUMIER wrote: > Hi, I have rebuild my cluster with ubuntu precise, > > -kernel 3.2 > -ceph 0.47.2 > -libc6 2.15 > -3 nodes - 5 osd (xfs) by node and 1 tmpfs with 5 journal file. > > I had launch rados bench, > and I see again constant writes to xfs.... > > Maybe this is related to tmpfs ? > > > I'll retry with kernel 3.4 from intank tomorrow. > I'll also try with journal on a physical disk with xfs partition. > > I'll keep you in touch. > > > ----- Mail original ----- > > De: "Mark Nelson"<mark.nelson@inktank.com> > À: "Alexandre DERUMIER"<aderumier@odiso.com> > Cc: "Amon Ott"<a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont"<Yann.Dupont@univ-nantes.fr> > Envoyé: Lundi 4 Juin 2012 14:59:58 > Objet: Re: Infiniband 40GB > > On 6/4/12 6:40 AM, Alexandre DERUMIER wrote: >> Hi, >> >> I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel. >> >> I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk). >> >> Journal is big enough (20GB tmpfs) to handle 30s of write. >> >> Do you think it's related to the missing syncfs() support ? >> >> -Alexandre > > Hi Alexandre, > > I've included some seekwatcher results for rados bench tests using 16 > concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the > other precise (ie no syncfs support vs syncfs support in libc). > Unfortunately the original test was on 0.46 and the second test was on > 0.47.2, so multiple things changed between the tests. Both were tested > with kernel 3.4. Interestingly the seeks/second don't seem to drop much > but the overall performance has about doubled. This was using a single > 7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the > journal in both cases. I'd definitely try 0.47.2 with a new libc though > and see how that works for you. > > ceph 0.46/oneiric: > http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg > > ceph 0.47.2/precise: > http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg > > Mark > > > -- -- Alexandre D erumier Ingénieur Système Fixe : 03 20 68 88 90 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix - France 12 rue Marivaux 75002 Paris - France -- -- Alexandre D erumier Ingénieur Système Fixe : 03 20 68 88 90 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix - France 12 rue Marivaux 75002 Paris - France -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-07 3:31 ` Infiniband 40GB Alexandre DERUMIER @ 2012-06-07 11:25 ` Alexandre DERUMIER 2012-06-07 17:15 ` Mark Nelson 0 siblings, 1 reply; 36+ messages in thread From: Alexandre DERUMIER @ 2012-06-07 11:25 UTC (permalink / raw) To: Mark Nelson; +Cc: Amon Ott, ceph-devel, Yann Dupont others tests done today: (kernel 3.4 - ubuntu precise) 3 nodes with 5 osd with btrfs, 1GB journal in tmps forced in writeahead 3 nodes with 1 osd with xfs,8GB journal in tmpfs 3 nodes with 1 osd with btfs,8GB journal in tmpfs forced in writeahead 3 nodes with 5 osd with btrfs, 20G journal on disk forced in writeahead 3 nodes with 1 osd with xfs,20GB journal on disk 3 nodes with 1 osd with btfs,20GB journal on disk forced in writeahead same behaviour for all cases, writes are constant to disk. benched with: rados -p pool3 bench 60 write -t 16 also with fio, bonnie , random/seq write from guest vm with differents block size. ----- Mail original ----- De: "Alexandre DERUMIER" <aderumier@odiso.com> À: "Mark Nelson" <mark.nelson@inktank.com> Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr> Envoyé: Jeudi 7 Juin 2012 05:31:15 Objet: Re: Infiniband 40GB Hi again, I have done some tests with journals on a real disk, I have same behaviour. iostat show constant write to journal and write to disks at the same time since the beginning of the benchmark. maybe can I try to use differents partitions for each journal ? (currently I have 1 partition with 5 journal files of each osd) -Alexandre ----- Mail original ----- De: "Alexandre DERUMIER" <aderumier@odiso.com> À: "Mark Nelson" <mark.nelson@inktank.com> Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr> Envoyé: Jeudi 7 Juin 2012 05:11:15 Objet: Re: Infiniband 40GB Hi mark, I have attached a blktrace of /dev/sdb1 of node1 (osd.0) and also iostat (showing constant writes) bench used: rados -p pool3 bench 60 write -t 16 kernel use : 3.4 from intank I'll do tests with journal on an xfs partition today ----- Mail original ----- De: "Mark Nelson" <mark.nelson@inktank.com> À: "Alexandre DERUMIER" <aderumier@odiso.com> Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr> Envoyé: Mercredi 6 Juin 2012 18:43:50 Objet: Re: Infiniband 40GB Hi Alexandre, If you can run blktrace during your test on one of the OSD data disks and send me the results I can take a look at them. Also, the rados bench settings and output would be useful too. Thanks, Mark On 6/6/12 11:05 AM, Alexandre DERUMIER wrote: > Hi, I have rebuild my cluster with ubuntu precise, > > -kernel 3.2 > -ceph 0.47.2 > -libc6 2.15 > -3 nodes - 5 osd (xfs) by node and 1 tmpfs with 5 journal file. > > I had launch rados bench, > and I see again constant writes to xfs.... > > Maybe this is related to tmpfs ? > > > I'll retry with kernel 3.4 from intank tomorrow. > I'll also try with journal on a physical disk with xfs partition. > > I'll keep you in touch. > > > ----- Mail original ----- > > De: "Mark Nelson"<mark.nelson@inktank.com> > À: "Alexandre DERUMIER"<aderumier@odiso.com> > Cc: "Amon Ott"<a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont"<Yann.Dupont@univ-nantes.fr> > Envoyé: Lundi 4 Juin 2012 14:59:58 > Objet: Re: Infiniband 40GB > > On 6/4/12 6:40 AM, Alexandre DERUMIER wrote: >> Hi, >> >> I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel. >> >> I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk). >> >> Journal is big enough (20GB tmpfs) to handle 30s of write. >> >> Do you think it's related to the missing syncfs() support ? >> >> -Alexandre > > Hi Alexandre, > > I've included some seekwatcher results for rados bench tests using 16 > concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the > other precise (ie no syncfs support vs syncfs support in libc). > Unfortunately the original test was on 0.46 and the second test was on > 0.47.2, so multiple things changed between the tests. Both were tested > with kernel 3.4. Interestingly the seeks/second don't seem to drop much > but the overall performance has about doubled. This was using a single > 7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the > journal in both cases. I'd definitely try 0.47.2 with a new libc though > and see how that works for you. > > ceph 0.46/oneiric: > http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg > > ceph 0.47.2/precise: > http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg > > Mark > > > -- -- Alexandre D erumier Ingénieur Système Fixe : 03 20 68 88 90 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix - France 12 rue Marivaux 75002 Paris - France -- -- Alexandre D erumier Ingénieur Système Fixe : 03 20 68 88 90 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix - France 12 rue Marivaux 75002 Paris - France -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- -- Alexandre D erumier Ingénieur Système Fixe : 03 20 68 88 90 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix - France 12 rue Marivaux 75002 Paris - France -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-07 11:25 ` Alexandre DERUMIER @ 2012-06-07 17:15 ` Mark Nelson 0 siblings, 0 replies; 36+ messages in thread From: Mark Nelson @ 2012-06-07 17:15 UTC (permalink / raw) To: Alexandre DERUMIER; +Cc: Amon Ott, ceph-devel, Yann Dupont On 6/7/12 6:25 AM, Alexandre DERUMIER wrote: > others tests done today: (kernel 3.4 - ubuntu precise) > > 3 nodes with 5 osd with btrfs, 1GB journal in tmps forced in writeahead > 3 nodes with 1 osd with xfs,8GB journal in tmpfs > 3 nodes with 1 osd with btfs,8GB journal in tmpfs forced in writeahead > > 3 nodes with 5 osd with btrfs, 20G journal on disk forced in writeahead > 3 nodes with 1 osd with xfs,20GB journal on disk > 3 nodes with 1 osd with btfs,20GB journal on disk forced in writeahead > > > > > same behaviour for all cases, writes are constant to disk. > > benched with: > rados -p pool3 bench 60 write -t 16 > > also with > fio, bonnie , random/seq write from guest vm with differents block size. > > > > > ----- Mail original ----- > > De: "Alexandre DERUMIER"<aderumier@odiso.com> > À: "Mark Nelson"<mark.nelson@inktank.com> > Cc: "Amon Ott"<a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont"<Yann.Dupont@univ-nantes.fr> > Envoyé: Jeudi 7 Juin 2012 05:31:15 > Objet: Re: Infiniband 40GB > > Hi again, > I have done some tests with journals on a real disk, I have same behaviour. > > iostat show constant write to journal and write to disks at the same time since the beginning of the benchmark. > > > maybe can I try to use differents partitions for each journal ? (currently I have 1 partition with 5 journal files of each osd) > > -Alexandre > > > > ----- Mail original ----- > > De: "Alexandre DERUMIER"<aderumier@odiso.com> > À: "Mark Nelson"<mark.nelson@inktank.com> > Cc: "Amon Ott"<a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont"<Yann.Dupont@univ-nantes.fr> > Envoyé: Jeudi 7 Juin 2012 05:11:15 > Objet: Re: Infiniband 40GB > > Hi mark, > I have attached a blktrace of /dev/sdb1 of node1 (osd.0) > > and also iostat (showing constant writes) > > bench used: > > rados -p pool3 bench 60 write -t 16 > > > kernel use : 3.4 from intank > > I'll do tests with journal on an xfs partition today > Hi Alexandre, I'll try to take a look at the data you sent me later today. Thanks! Mark -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Infiniband 40GB @ 2012-06-03 8:10 Stefan Priebe 2012-06-03 12:56 ` Mark Nelson 0 siblings, 1 reply; 36+ messages in thread From: Stefan Priebe @ 2012-06-03 8:10 UTC (permalink / raw) To: ceph-devel@vger.kernel.org Hi List, has anybody already tried CEPH over Infiniband 40GB? Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-03 8:10 Stefan Priebe @ 2012-06-03 12:56 ` Mark Nelson 2012-06-04 6:22 ` Hannes Reinecke 0 siblings, 1 reply; 36+ messages in thread From: Mark Nelson @ 2012-06-03 12:56 UTC (permalink / raw) To: Stefan Priebe; +Cc: ceph-devel@vger.kernel.org On 6/3/12 3:10 AM, Stefan Priebe wrote: > Hi List, > > has anybody already tried CEPH over Infiniband 40GB? > > Stefan > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Hi Stefan, A couple of folks have done DDR IB. For now you are limited to ipoib though. If you have the hardware available I'd be really curious what kind of throughput/latencies you see. Mark ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-03 12:56 ` Mark Nelson @ 2012-06-04 6:22 ` Hannes Reinecke 2012-06-04 7:26 ` Stefan Priebe - Profihost AG 2012-06-04 12:28 ` Mark Nelson 0 siblings, 2 replies; 36+ messages in thread From: Hannes Reinecke @ 2012-06-04 6:22 UTC (permalink / raw) To: Mark Nelson; +Cc: Stefan Priebe, ceph-devel@vger.kernel.org On 06/03/2012 02:56 PM, Mark Nelson wrote: > On 6/3/12 3:10 AM, Stefan Priebe wrote: >> Hi List, >> >> has anybody already tried CEPH over Infiniband 40GB? >> >> Stefan >> -- >> To unsubscribe from this list: send the line "unsubscribe >> ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > Hi Stefan, > > A couple of folks have done DDR IB. For now you are limited to > ipoib though. If you have the hardware available I'd be really > curious what kind of throughput/latencies you see. > Hehe. Good luck with that. We've tried on 10GigE with _disastrous_ results. Up to the point where 1GigE was actually _faster_. So far we've uncovered two issues: - intel_idle was/is seriously broken (we've tried on 3.0-stable, so might've been fixed by now) - osd-server is calling 'fsync' on each and every write request. Does wonders for performance ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 6:22 ` Hannes Reinecke @ 2012-06-04 7:26 ` Stefan Priebe - Profihost AG 2012-06-04 7:39 ` Hannes Reinecke 2012-06-04 12:28 ` Mark Nelson 1 sibling, 1 reply; 36+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-06-04 7:26 UTC (permalink / raw) To: Hannes Reinecke; +Cc: Mark Nelson, ceph-devel@vger.kernel.org Am 04.06.2012 08:22, schrieb Hannes Reinecke: > Hehe. > Good luck with that. > > We've tried on 10GigE with _disastrous_ results. > Up to the point where 1GigE was actually _faster_. So you mean you've tried 10GBE or 10GB ipoib with Infiniband? > - osd-server is calling 'fsync' on each and every write request. > Does wonders for performance ... Already talked to the ceph guys? Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 7:26 ` Stefan Priebe - Profihost AG @ 2012-06-04 7:39 ` Hannes Reinecke 2012-06-04 7:53 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 36+ messages in thread From: Hannes Reinecke @ 2012-06-04 7:39 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: Mark Nelson, ceph-devel@vger.kernel.org On 06/04/2012 09:26 AM, Stefan Priebe - Profihost AG wrote: > Am 04.06.2012 08:22, schrieb Hannes Reinecke: >> Hehe. >> Good luck with that. >> >> We've tried on 10GigE with _disastrous_ results. >> Up to the point where 1GigE was actually _faster_. > > So you mean you've tried 10GBE or 10GB ipoib with Infiniband? > >> - osd-server is calling 'fsync' on each and every write request. >> Does wonders for performance ... > Already talked to the ceph guys? > Still not there yet. Still need to figure out the exact details; performance regressions are notoriously hard to track. But yeah, rumours have it we are in contact. Project management on our side could be improved, though ;) Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 7:39 ` Hannes Reinecke @ 2012-06-04 7:53 ` Stefan Priebe - Profihost AG 2012-06-04 8:02 ` Hannes Reinecke 0 siblings, 1 reply; 36+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-06-04 7:53 UTC (permalink / raw) To: Hannes Reinecke; +Cc: Mark Nelson, ceph-devel@vger.kernel.org Am 04.06.2012 09:39, schrieb Hannes Reinecke: > On 06/04/2012 09:26 AM, Stefan Priebe - Profihost AG wrote: >> Am 04.06.2012 08:22, schrieb Hannes Reinecke: >>> Hehe. >>> Good luck with that. >>> >>> We've tried on 10GigE with _disastrous_ results. >>> Up to the point where 1GigE was actually _faster_. >> >> So you mean you've tried 10GBE or 10GB ipoib with Infiniband? Could you please answer this question too? Thx. Cheers, Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 7:53 ` Stefan Priebe - Profihost AG @ 2012-06-04 8:02 ` Hannes Reinecke 2012-06-04 8:23 ` Stefan Majer 0 siblings, 1 reply; 36+ messages in thread From: Hannes Reinecke @ 2012-06-04 8:02 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: Mark Nelson, ceph-devel@vger.kernel.org On 06/04/2012 09:53 AM, Stefan Priebe - Profihost AG wrote: > Am 04.06.2012 09:39, schrieb Hannes Reinecke: >> On 06/04/2012 09:26 AM, Stefan Priebe - Profihost AG wrote: >>> Am 04.06.2012 08:22, schrieb Hannes Reinecke: >>>> Hehe. >>>> Good luck with that. >>>> >>>> We've tried on 10GigE with _disastrous_ results. >>>> Up to the point where 1GigE was actually _faster_. >>> >>> So you mean you've tried 10GBE or 10GB ipoib with Infiniband? > > Could you please answer this question too? Thx. > This was plain 10GigE, ie TCP/IP. Not infiniband, I'm afraid. However, given that our problems have not been related to the actual transport I'd be very much surprised if they would not occur on Infiniband. And I would _definitely_ like to hear if someone managed to get any decent speed (notably write speed) on fast interconnects. There's always a chance we've messed things up and were just measuring our crap setup ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 8:02 ` Hannes Reinecke @ 2012-06-04 8:23 ` Stefan Majer 2012-06-04 9:21 ` Yann Dupont 2012-06-05 8:54 ` Stefan Priebe - Profihost AG 0 siblings, 2 replies; 36+ messages in thread From: Stefan Majer @ 2012-06-04 8:23 UTC (permalink / raw) To: Hannes Reinecke Cc: Stefan Priebe - Profihost AG, Mark Nelson, ceph-devel@vger.kernel.org Hi Hannes, our production environment is running on 10GB infrastructure. We had a lot of troubles till we got to where we are today. We use Intel X520 D2 cards on our OSD´s and nexus switch infrastructure. All other cards we where testing failed horrible. Some of the problems we encountered have been: - page allocation failures in the ixgbe driver --> fixed in upstream - problems with jumbo frames, we had to disable tso, gro, lro -- > this is the most obscure thing - various tuning via sysctl in the net.tcp and net.ipv4 area --> this was also the outcome of stefan´s benchmarking odysee. But after all this we a quite happy actully and are only limited by the speed of the drives (2TB SATA). The fsync is a fdatasync in fact which is available in newer glibc. If you dont use btrfs (we use xfs) you need to use a recent glibc with fdatasync support. On Mon, Jun 4, 2012 at 10:02 AM, Hannes Reinecke <hare@suse.de> wrote: hope this helps Greetings Stefan > On 06/04/2012 09:53 AM, Stefan Priebe - Profihost AG wrote: >> Am 04.06.2012 09:39, schrieb Hannes Reinecke: >>> On 06/04/2012 09:26 AM, Stefan Priebe - Profihost AG wrote: >>>> Am 04.06.2012 08:22, schrieb Hannes Reinecke: >>>>> Hehe. >>>>> Good luck with that. >>>>> >>>>> We've tried on 10GigE with _disastrous_ results. >>>>> Up to the point where 1GigE was actually _faster_. >>>> >>>> So you mean you've tried 10GBE or 10GB ipoib with Infiniband? >> >> Could you please answer this question too? Thx. >> > This was plain 10GigE, ie TCP/IP. Not infiniband, I'm afraid. > > However, given that our problems have not been related to the actual > transport I'd be very much surprised if they would not occur on > Infiniband. > > And I would _definitely_ like to hear if someone managed to get any > decent speed (notably write speed) on fast interconnects. > There's always a chance we've messed things up and were just > measuring our crap setup ... > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke zSeries & Storage > hare@suse.de +49 911 74053 688 > SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Stefan Majer -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 8:23 ` Stefan Majer @ 2012-06-04 9:21 ` Yann Dupont 2012-06-04 9:35 ` Alexandre DERUMIER 2012-06-04 9:47 ` Amon Ott 2012-06-05 8:54 ` Stefan Priebe - Profihost AG 1 sibling, 2 replies; 36+ messages in thread From: Yann Dupont @ 2012-06-04 9:21 UTC (permalink / raw) To: Stefan Majer Cc: Hannes Reinecke, Stefan Priebe - Profihost AG, Mark Nelson, ceph-devel@vger.kernel.org Le 04/06/2012 10:23, Stefan Majer a écrit : > Hi Hannes, > > our production environment is running on 10GB infrastructure. We had a > lot of troubles till we got to where we are today. > We use Intel X520 D2 cards on our OSD´s and nexus switch > infrastructure. All other cards we where testing failed horrible. > we have Intel Corporation 82599EB 10 Gigabit Dual Port Backplane Connection (rev 01)... Don't know the 'commercial name'. ixgbe driver. > Some of the problems we encountered have been: > - page allocation failures in the ixgbe driver --> fixed in upstream > - problems with jumbo frames, we had to disable tso, gro, lro -- > > this is the most obscure thing > - various tuning via sysctl in the net.tcp and net.ipv4 area --> this > was also the outcome of stefan´s benchmarking odysee. some tuning we made : -> Turning off Virtualisation extension in BIOS. Don't know why, but it gaves us crappy performance. We usually put it on, because we use KVM a lot. In our case, OSD are in bare metal and disabling virtualisation extension gives us a very big boost. It may be a BIOS bug in our machines (DELL M610). -> One of my colleague played with receive flow steeting ; the intel card supports multi queue, so it seems we can gain a little with it : !/bin/sh for x in $(seq 0 23); do echo FFFFFFFF > /sys/class/net/eth2/queues/rx-${x}/rps_cpus; done echo 16384 > /proc/sys/net/core/rps_sock_flow_entries for x in $(seq 0 23); do echo 16384 > /sys/class/net/eth2/queues/rx-${x}/rps_flow_cnt; done > > But after all this we a quite happy actully and are only limited by > the speed of the drives (2TB SATA). > The fsync is a fdatasync in fact which is available in newer glibc. If > you dont use btrfs (we use xfs) you need to use a recent glibc with > fdatasync support. Does it may explain why we see loosy performance with xfs right now ? That the main reason we're stuck with btrfs for the moment. we're using debian 'stable' : libc is libc6 2.11.3-3 probably too old ? Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 9:21 ` Yann Dupont @ 2012-06-04 9:35 ` Alexandre DERUMIER 2012-06-04 9:53 ` Yann Dupont 2012-06-04 9:47 ` Amon Ott 1 sibling, 1 reply; 36+ messages in thread From: Alexandre DERUMIER @ 2012-06-04 9:35 UTC (permalink / raw) To: Yann Dupont Cc: Hannes Reinecke, Stefan Priebe - Profihost AG, Mark Nelson, ceph-devel, Stefan Majer Hi, about this: >> Turning off Virtualisation extension in BIOS. Don't know why, but it >>gaves us crappy performance. We usually put it on, because we use KVM a >>lot. In our case, OSD are in bare metal and disabling virtualisation >>extension gives us a very big boost. >>It may be a BIOS bug in our machines (DELL M610). It could be related to iommu, if you pass intel_iommu=on in grub. I have already had this kind of problem. When intel_iommu=on, Linux (completely unrelated to KVM) adds a new level of protection which didn't exist without an IOMMU - the network card, which without an IOMMU could write (via DMA) to any memory location, now is not allowed - the card can only write to memory locates which the OS wanted it to write. Theoretically, this can protect the OS against various kinds of attacks. But what happens now is that every time that Linux passes a new buffer to the card, it needs to change the IOMMU mappings. This noticably slows down I/O, unfortunately. ----- Mail original ----- De: "Yann Dupont" <Yann.Dupont@univ-nantes.fr> À: "Stefan Majer" <stefan.majer@gmail.com> Cc: "Hannes Reinecke" <hare@suse.de>, "Stefan Priebe - Profihost AG" <s.priebe@profihost.ag>, "Mark Nelson" <mark.nelson@inktank.com>, ceph-devel@vger.kernel.org Envoyé: Lundi 4 Juin 2012 11:21:56 Objet: Re: Infiniband 40GB Le 04/06/2012 10:23, Stefan Majer a écrit : > Hi Hannes, > > our production environment is running on 10GB infrastructure. We had a > lot of troubles till we got to where we are today. > We use Intel X520 D2 cards on our OSD´s and nexus switch > infrastructure. All other cards we where testing failed horrible. > we have Intel Corporation 82599EB 10 Gigabit Dual Port Backplane Connection (rev 01)... Don't know the 'commercial name'. ixgbe driver. > Some of the problems we encountered have been: > - page allocation failures in the ixgbe driver --> fixed in upstream > - problems with jumbo frames, we had to disable tso, gro, lro -- > > this is the most obscure thing > - various tuning via sysctl in the net.tcp and net.ipv4 area --> this > was also the outcome of stefan´s benchmarking odysee. some tuning we made : -> Turning off Virtualisation extension in BIOS. Don't know why, but it gaves us crappy performance. We usually put it on, because we use KVM a lot. In our case, OSD are in bare metal and disabling virtualisation extension gives us a very big boost. It may be a BIOS bug in our machines (DELL M610). -> One of my colleague played with receive flow steeting ; the intel card supports multi queue, so it seems we can gain a little with it : !/bin/sh for x in $(seq 0 23); do echo FFFFFFFF > /sys/class/net/eth2/queues/rx-${x}/rps_cpus; done echo 16384 > /proc/sys/net/core/rps_sock_flow_entries for x in $(seq 0 23); do echo 16384 > /sys/class/net/eth2/queues/rx-${x}/rps_flow_cnt; done > > But after all this we a quite happy actully and are only limited by > the speed of the drives (2TB SATA). > The fsync is a fdatasync in fact which is available in newer glibc. If > you dont use btrfs (we use xfs) you need to use a recent glibc with > fdatasync support. Does it may explain why we see loosy performance with xfs right now ? That the main reason we're stuck with btrfs for the moment. we're using debian 'stable' : libc is libc6 2.11.3-3 probably too old ? Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- -- Alexandre D erumier Ingénieur Système Fixe : 03 20 68 88 90 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix - France 12 rue Marivaux 75002 Paris - France -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 9:35 ` Alexandre DERUMIER @ 2012-06-04 9:53 ` Yann Dupont 0 siblings, 0 replies; 36+ messages in thread From: Yann Dupont @ 2012-06-04 9:53 UTC (permalink / raw) To: Alexandre DERUMIER Cc: Hannes Reinecke, Stefan Priebe - Profihost AG, Mark Nelson, ceph-devel, Stefan Majer Le 04/06/2012 11:35, Alexandre DERUMIER a écrit : > Hi, > about this: >>> Turning off Virtualisation extension in BIOS. Don't know why, but it >>> gaves us crappy performance. We usually put it on, because we use KVM a >>> lot. In our case, OSD are in bare metal and disabling virtualisation >>> extension gives us a very big boost. >>> It may be a BIOS bug in our machines (DELL M610). > > It could be related to iommu, if you pass intel_iommu=on in grub. > I have already had this kind of problem. > > When intel_iommu=on, Linux (completely unrelated to KVM) adds a new level > of protection which didn't exist without an IOMMU - the network card, which > without an IOMMU could write (via DMA) to any memory location, now is > not allowed - the card can only write to memory locates which the OS > wanted it to write. Theoretically, this can protect the OS against > various kinds of attacks. But what happens now is that every time that > Linux passes a new buffer to the card, it needs to change the IOMMU > mappings. This noticably slows down I/O, unfortunately. > > Infortunately, this is not the case. The intel card supports it, but DELL M160 don't. And I just checked, ou linux command line don't include intel_iommu=on. BTW, it seems that turning on virtualization on bios kills performance on integrated ixgbe driver. Sourceforge one seems less affected. Our tests were circa kernel 3.2 , it may have changed since. Cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 9:21 ` Yann Dupont 2012-06-04 9:35 ` Alexandre DERUMIER @ 2012-06-04 9:47 ` Amon Ott 2012-06-04 9:58 ` Yann Dupont ` (3 more replies) 1 sibling, 4 replies; 36+ messages in thread From: Amon Ott @ 2012-06-04 9:47 UTC (permalink / raw) To: Yann Dupont; +Cc: ceph-devel [-- Attachment #1: Type: text/plain, Size: 3097 bytes --] On Monday 04 June 2012 you wrote: > Le 04/06/2012 10:23, Stefan Majer a écrit : > > Hi Hannes, > > > > our production environment is running on 10GB infrastructure. We had a > > lot of troubles till we got to where we are today. > > We use Intel X520 D2 cards on our OSD´s and nexus switch > > infrastructure. All other cards we where testing failed horrible. > > we have Intel Corporation 82599EB 10 Gigabit Dual Port Backplane > Connection (rev 01)... Don't know the 'commercial name'. ixgbe driver. > > > Some of the problems we encountered have been: > > - page allocation failures in the ixgbe driver --> fixed in upstream > > - problems with jumbo frames, we had to disable tso, gro, lro -- > > > this is the most obscure thing > > - various tuning via sysctl in the net.tcp and net.ipv4 area --> this > > was also the outcome of stefan´s benchmarking odysee. > > some tuning we made : > > -> Turning off Virtualisation extension in BIOS. Don't know why, but it > gaves us crappy performance. We usually put it on, because we use KVM a > lot. In our case, OSD are in bare metal and disabling virtualisation > extension gives us a very big boost. > It may be a BIOS bug in our machines (DELL M610). > > -> One of my colleague played with receive flow steeting ; the intel > card supports multi queue, so it seems we can gain a little with it : > > !/bin/sh > > for x in $(seq 0 23); do echo FFFFFFFF > > /sys/class/net/eth2/queues/rx-${x}/rps_cpus; done > echo 16384 > /proc/sys/net/core/rps_sock_flow_entries > for x in $(seq 0 23); do echo 16384 > > /sys/class/net/eth2/queues/rx-${x}/rps_flow_cnt; done > > > But after all this we a quite happy actully and are only limited by > > the speed of the drives (2TB SATA). > > The fsync is a fdatasync in fact which is available in newer glibc. If > > you dont use btrfs (we use xfs) you need to use a recent glibc with > > fdatasync support. > > Does it may explain why we see loosy performance with xfs right now ? > That the main reason we're stuck with btrfs for the moment. > > we're using debian 'stable' : libc is > libc6 2.11.3-3 > probably too old ? One reason for performance problems with that libc6 version is missing syncfs() support. I backported a patch for 2.13, originally by Andreas Schwab, schwab@redhat.com, to Debian stable code. Patch is attached. Copy the patch to eglibc's debian/patches/, add to debian/patches/series, rebuild eglibc packages (including libc6) with dpkg-buildpackage, install new libc6-dev, rebuild ceph packages against it, install and retry. AFAIK, not even libc6 in Debian experimental has syncfs() support. Also see thread "OSD deadlock with cephfs client and OSD on same machine" Amon Ott -- Dr. Amon Ott m-privacy GmbH Tel: +49 30 24342334 Am Köllnischen Park 1 Fax: +49 30 24342336 10179 Berlin http://www.m-privacy.de Amtsgericht Charlottenburg, HRB 84946 Geschäftsführer: Dipl.-Kfm. Holger Maczkowsky, Roman Maczkowsky GnuPG-Key-ID: 0x2DD3A649 [-- Attachment #2: syncfs.diff --] [-- Type: text/x-diff, Size: 4110 bytes --] Versions.def | 1 + misc/Makefile | 4 ++-- misc/Versions | 3 +++ misc/syncfs.c | 33 +++++++++++++++++++++++++++++++++ posix/unistd.h | 9 ++++++++- sysdeps/unix/syscalls.list | 1 + 6 files changed, 48 insertions(+), 3 deletions(-) create mode 100644 misc/syncfs.c diff --git a/Versions.def b/Versions.def index 0ccda50..e478fdd 100644 --- a/Versions.def +++ b/Versions.def @@ -30,5 +30,6 @@ libc { GLIBC_2.11 GLIBC_2.12 + GLIBC_2.14 %ifdef USE_IN_LIBIO HURD_CTHREADS_0.3 %endif diff --git a/misc/Makefile b/misc/Makefile index ee69361..52b13da 100644 --- a/misc/Makefile +++ b/misc/Makefile @@ -1,4 +1,4 @@ -# Copyright (C) 1991-2006, 2007, 2009 Free Software Foundation, Inc. +# Copyright (C) 1991-2006, 2007, 2009, 2011 Free Software Foundation, Inc. # This file is part of the GNU C Library. # The GNU C Library is free software; you can redistribute it and/or @@ -45,7 +45,7 @@ routines := brk sbrk sstk ioctl \ getdtsz \ gethostname sethostname getdomain setdomain \ select pselect \ - acct chroot fsync sync fdatasync reboot \ + acct chroot fsync sync fdatasync syncfs reboot \ gethostid sethostid \ vhangup \ swapon swapoff mktemp mkstemp mkstemp64 mkdtemp \ diff --git a/misc/Versions b/misc/Versions index 3ffe3d1..3a31c7f 100644 --- a/misc/Versions +++ b/misc/Versions @@ -143,4 +143,7 @@ libc { GLIBC_2.11 { mkstemps; mkstemps64; mkostemps; mkostemps64; } + GLIBC_2.14 { + syncfs; + } } diff --git a/misc/syncfs.c b/misc/syncfs.c new file mode 100644 index 0000000..bd7328c --- /dev/null +++ b/misc/syncfs.c @@ -0,0 +1,33 @@ +/* Copyright (C) 2011 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA + 02111-1307 USA. */ + +#include <errno.h> +#include <unistd.h> + +/* Make all changes done to all files on the file system associated + with FD actually appear on disk. */ +int +syncfs (int fd) +{ + __set_errno (ENOSYS); + return -1; +} + + +stub_warning (syncfs) +#include <stub-tag.h> diff --git a/posix/unistd.h b/posix/unistd.h index 5ebcaf1..aa11860 100644 --- a/posix/unistd.h +++ b/posix/unistd.h @@ -1,4 +1,4 @@ -/* Copyright (C) 1991-2006, 2007, 2008, 2009 Free Software Foundation, Inc. +/* Copyright (C) 1991-2009, 2010, 2011 Free Software Foundation, Inc. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -974,6 +974,13 @@ extern int fsync (int __fd); #endif /* Use BSD || X/Open || Unix98. */ +#ifdef __USE_GNU +/* Make all changes done to all files on the file system associated + with FD actually appear on disk. */ +extern int syncfs (int __fd) __THROW; +#endif + + #if defined __USE_BSD || defined __USE_XOPEN_EXTENDED /* Return identifier for the current host. */ diff --git a/sysdeps/unix/syscalls.list b/sysdeps/unix/syscalls.list index 04ed63c..ad49170 100644 --- a/sysdeps/unix/syscalls.list +++ b/sysdeps/unix/syscalls.list @@ -55,6 +55,7 @@ swapoff - swapoff i:s swapoff swapon - swapon i:s swapon symlink - symlink i:ss __symlink symlink sync - sync i: sync +syncfs - syncfs i:i syncfs sys_fstat fxstat fstat i:ip __syscall_fstat sys_mknod xmknod mknod i:sii __syscall_mknod sys_stat xstat stat i:sp __syscall_stat -- 1.7.4 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 9:47 ` Amon Ott @ 2012-06-04 9:58 ` Yann Dupont 2012-06-04 11:40 ` Alexandre DERUMIER ` (2 subsequent siblings) 3 siblings, 0 replies; 36+ messages in thread From: Yann Dupont @ 2012-06-04 9:58 UTC (permalink / raw) To: Amon Ott; +Cc: ceph-devel Le 04/06/2012 11:47, Amon Ott a écrit : > even libc6 in Debian experimental has syncfs() support. > > Also see thread "OSD deadlock with cephfs client and OSD on same machine" Great , thanks for explanation. ... lots of tests to do this afternoon :) I need to convert my OSD with xfs, benchmark with standard libc, then convert libc with your patch & retest. Thanks, cheers, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 9:47 ` Amon Ott 2012-06-04 9:58 ` Yann Dupont @ 2012-06-04 11:40 ` Alexandre DERUMIER 2012-06-04 12:59 ` Mark Nelson 2012-06-04 15:42 ` Stefan Priebe 2012-06-06 10:48 ` Stefan Priebe - Profihost AG 3 siblings, 1 reply; 36+ messages in thread From: Alexandre DERUMIER @ 2012-06-04 11:40 UTC (permalink / raw) To: Amon Ott; +Cc: ceph-devel, Yann Dupont Hi, I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel. I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk). Journal is big enough (20GB tmpfs) to handle 30s of write. Do you think it's related to the missing syncfs() support ? -Alexandre ----- Mail original ----- De: "Amon Ott" <a.ott@m-privacy.de> À: "Yann Dupont" <Yann.Dupont@univ-nantes.fr> Cc: ceph-devel@vger.kernel.org Envoyé: Lundi 4 Juin 2012 11:47:22 Objet: Re: Infiniband 40GB On Monday 04 June 2012 you wrote: > Le 04/06/2012 10:23, Stefan Majer a écrit : > > Hi Hannes, > > > > our production environment is running on 10GB infrastructure. We had a > > lot of troubles till we got to where we are today. > > We use Intel X520 D2 cards on our OSD´s and nexus switch > > infrastructure. All other cards we where testing failed horrible. > > we have Intel Corporation 82599EB 10 Gigabit Dual Port Backplane > Connection (rev 01)... Don't know the 'commercial name'. ixgbe driver. > > > Some of the problems we encountered have been: > > - page allocation failures in the ixgbe driver --> fixed in upstream > > - problems with jumbo frames, we had to disable tso, gro, lro -- > > > this is the most obscure thing > > - various tuning via sysctl in the net.tcp and net.ipv4 area --> this > > was also the outcome of stefan´s benchmarking odysee. > > some tuning we made : > > -> Turning off Virtualisation extension in BIOS. Don't know why, but it > gaves us crappy performance. We usually put it on, because we use KVM a > lot. In our case, OSD are in bare metal and disabling virtualisation > extension gives us a very big boost. > It may be a BIOS bug in our machines (DELL M610). > > -> One of my colleague played with receive flow steeting ; the intel > card supports multi queue, so it seems we can gain a little with it : > > !/bin/sh > > for x in $(seq 0 23); do echo FFFFFFFF > > /sys/class/net/eth2/queues/rx-${x}/rps_cpus; done > echo 16384 > /proc/sys/net/core/rps_sock_flow_entries > for x in $(seq 0 23); do echo 16384 > > /sys/class/net/eth2/queues/rx-${x}/rps_flow_cnt; done > > > But after all this we a quite happy actully and are only limited by > > the speed of the drives (2TB SATA). > > The fsync is a fdatasync in fact which is available in newer glibc. If > > you dont use btrfs (we use xfs) you need to use a recent glibc with > > fdatasync support. > > Does it may explain why we see loosy performance with xfs right now ? > That the main reason we're stuck with btrfs for the moment. > > we're using debian 'stable' : libc is > libc6 2.11.3-3 > probably too old ? One reason for performance problems with that libc6 version is missing syncfs() support. I backported a patch for 2.13, originally by Andreas Schwab, schwab@redhat.com, to Debian stable code. Patch is attached. Copy the patch to eglibc's debian/patches/, add to debian/patches/series, rebuild eglibc packages (including libc6) with dpkg-buildpackage, install new libc6-dev, rebuild ceph packages against it, install and retry. AFAIK, not even libc6 in Debian experimental has syncfs() support. Also see thread "OSD deadlock with cephfs client and OSD on same machine" Amon Ott -- Dr. Amon Ott m-privacy GmbH Tel: +49 30 24342334 Am Köllnischen Park 1 Fax: +49 30 24342336 10179 Berlin http://www.m-privacy.de Amtsgericht Charlottenburg, HRB 84946 Geschäftsführer: Dipl.-Kfm. Holger Maczkowsky, Roman Maczkowsky GnuPG-Key-ID: 0x2DD3A649 -- -- Alexandre D erumier Ingénieur Système Fixe : 03 20 68 88 90 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix - France 12 rue Marivaux 75002 Paris - France -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 11:40 ` Alexandre DERUMIER @ 2012-06-04 12:59 ` Mark Nelson 2012-06-04 13:07 ` Alexandre DERUMIER 2012-06-06 16:05 ` Alexandre DERUMIER 0 siblings, 2 replies; 36+ messages in thread From: Mark Nelson @ 2012-06-04 12:59 UTC (permalink / raw) To: Alexandre DERUMIER; +Cc: Amon Ott, ceph-devel, Yann Dupont On 6/4/12 6:40 AM, Alexandre DERUMIER wrote: > Hi, > > I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel. > > I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk). > > Journal is big enough (20GB tmpfs) to handle 30s of write. > > Do you think it's related to the missing syncfs() support ? > > -Alexandre Hi Alexandre, I've included some seekwatcher results for rados bench tests using 16 concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the other precise (ie no syncfs support vs syncfs support in libc). Unfortunately the original test was on 0.46 and the second test was on 0.47.2, so multiple things changed between the tests. Both were tested with kernel 3.4. Interestingly the seeks/second don't seem to drop much but the overall performance has about doubled. This was using a single 7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the journal in both cases. I'd definitely try 0.47.2 with a new libc though and see how that works for you. ceph 0.46/oneiric: http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg ceph 0.47.2/precise: http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg Mark ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 12:59 ` Mark Nelson @ 2012-06-04 13:07 ` Alexandre DERUMIER 2012-06-04 13:28 ` Mark Nelson 2012-06-06 16:05 ` Alexandre DERUMIER 1 sibling, 1 reply; 36+ messages in thread From: Alexandre DERUMIER @ 2012-06-04 13:07 UTC (permalink / raw) To: Mark Nelson; +Cc: Amon Ott, ceph-devel, Yann Dupont Thanks Mark, I'll rebuild my cluster with ubuntu precise tomorrow. (Don't have time to backport/maintain libc6 ;) BTW, do you use mainly ubuntu at intank for your tests ? I'd like to have a setup as close as possible of intank setup. ----- Mail original ----- De: "Mark Nelson" <mark.nelson@inktank.com> À: "Alexandre DERUMIER" <aderumier@odiso.com> Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr> Envoyé: Lundi 4 Juin 2012 14:59:58 Objet: Re: Infiniband 40GB On 6/4/12 6:40 AM, Alexandre DERUMIER wrote: > Hi, > > I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel. > > I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk). > > Journal is big enough (20GB tmpfs) to handle 30s of write. > > Do you think it's related to the missing syncfs() support ? > > -Alexandre Hi Alexandre, I've included some seekwatcher results for rados bench tests using 16 concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the other precise (ie no syncfs support vs syncfs support in libc). Unfortunately the original test was on 0.46 and the second test was on 0.47.2, so multiple things changed between the tests. Both were tested with kernel 3.4. Interestingly the seeks/second don't seem to drop much but the overall performance has about doubled. This was using a single 7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the journal in both cases. I'd definitely try 0.47.2 with a new libc though and see how that works for you. ceph 0.46/oneiric: http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg ceph 0.47.2/precise: http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg Mark -- -- Alexandre D erumier Ingénieur Système Fixe : 03 20 68 88 90 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix - France 12 rue Marivaux 75002 Paris - France -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 13:07 ` Alexandre DERUMIER @ 2012-06-04 13:28 ` Mark Nelson 2012-06-04 15:11 ` Gregory Farnum 0 siblings, 1 reply; 36+ messages in thread From: Mark Nelson @ 2012-06-04 13:28 UTC (permalink / raw) To: Alexandre DERUMIER; +Cc: Amon Ott, ceph-devel, Yann Dupont Hi Alexandre, A lot of our testing is on Ubuntu right now. I'm using the ceph and kernel debs from ceph.gitbuilder.com for my tests. Post some results to the list once you get your cluster setup! Thanks, Mark On 6/4/12 8:07 AM, Alexandre DERUMIER wrote: > Thanks Mark, > I'll rebuild my cluster with ubuntu precise tomorrow. (Don't have time to backport/maintain libc6 ;) > > > BTW, do you use mainly ubuntu at intank for your tests ? > > I'd like to have a setup as close as possible of intank setup. > > > ----- Mail original ----- > > De: "Mark Nelson"<mark.nelson@inktank.com> > À: "Alexandre DERUMIER"<aderumier@odiso.com> > Cc: "Amon Ott"<a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont"<Yann.Dupont@univ-nantes.fr> > Envoyé: Lundi 4 Juin 2012 14:59:58 > Objet: Re: Infiniband 40GB > > On 6/4/12 6:40 AM, Alexandre DERUMIER wrote: >> Hi, >> >> I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel. >> >> I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk). >> >> Journal is big enough (20GB tmpfs) to handle 30s of write. >> >> Do you think it's related to the missing syncfs() support ? >> >> -Alexandre > > Hi Alexandre, > > I've included some seekwatcher results for rados bench tests using 16 > concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the > other precise (ie no syncfs support vs syncfs support in libc). > Unfortunately the original test was on 0.46 and the second test was on > 0.47.2, so multiple things changed between the tests. Both were tested > with kernel 3.4. Interestingly the seeks/second don't seem to drop much > but the overall performance has about doubled. This was using a single > 7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the > journal in both cases. I'd definitely try 0.47.2 with a new libc though > and see how that works for you. > > ceph 0.46/oneiric: > http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg > > ceph 0.47.2/precise: > http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg > > Mark > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 13:28 ` Mark Nelson @ 2012-06-04 15:11 ` Gregory Farnum 2012-06-04 15:34 ` Mark Nelson 0 siblings, 1 reply; 36+ messages in thread From: Gregory Farnum @ 2012-06-04 15:11 UTC (permalink / raw) To: ceph-devel; +Cc: Alexandre DERUMIER, Amon Ott, Yann Dupont, Mark Nelson On Monday, June 4, 2012 at 6:28 AM, Mark Nelson wrote: > Hi Alexandre, > > A lot of our testing is on Ubuntu right now. I'm using the ceph and > kernel debs from ceph.gitbuilder.com (http://ceph.gitbuilder.com) for my tests. Post some results to > the list once you get your cluster setup! > I think he means gitbuilder.ceph.com. ;) -Greg ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 15:11 ` Gregory Farnum @ 2012-06-04 15:34 ` Mark Nelson 0 siblings, 0 replies; 36+ messages in thread From: Mark Nelson @ 2012-06-04 15:34 UTC (permalink / raw) To: Gregory Farnum; +Cc: ceph-devel, Alexandre DERUMIER, Amon Ott, Yann Dupont On 06/04/2012 10:11 AM, Gregory Farnum wrote: > On Monday, June 4, 2012 at 6:28 AM, Mark Nelson wrote: >> Hi Alexandre, >> >> A lot of our testing is on Ubuntu right now. I'm using the ceph and >> kernel debs from ceph.gitbuilder.com (http://ceph.gitbuilder.com) for my tests. Post some results to >> the list once you get your cluster setup! >> > > I think he means gitbuilder.ceph.com. ;) > -Greg > Doh! This is why I need caffeine before writing emails. Thanks Greg. :) Mark ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 12:59 ` Mark Nelson 2012-06-04 13:07 ` Alexandre DERUMIER @ 2012-06-06 16:05 ` Alexandre DERUMIER 2012-06-06 16:43 ` Mark Nelson 1 sibling, 1 reply; 36+ messages in thread From: Alexandre DERUMIER @ 2012-06-06 16:05 UTC (permalink / raw) To: Mark Nelson; +Cc: Amon Ott, ceph-devel, Yann Dupont Hi, I have rebuild my cluster with ubuntu precise, -kernel 3.2 -ceph 0.47.2 -libc6 2.15 -3 nodes - 5 osd (xfs) by node and 1 tmpfs with 5 journal file. I had launch rados bench, and I see again constant writes to xfs.... Maybe this is related to tmpfs ? I'll retry with kernel 3.4 from intank tomorrow. I'll also try with journal on a physical disk with xfs partition. I'll keep you in touch. ----- Mail original ----- De: "Mark Nelson" <mark.nelson@inktank.com> À: "Alexandre DERUMIER" <aderumier@odiso.com> Cc: "Amon Ott" <a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont" <Yann.Dupont@univ-nantes.fr> Envoyé: Lundi 4 Juin 2012 14:59:58 Objet: Re: Infiniband 40GB On 6/4/12 6:40 AM, Alexandre DERUMIER wrote: > Hi, > > I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel. > > I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk). > > Journal is big enough (20GB tmpfs) to handle 30s of write. > > Do you think it's related to the missing syncfs() support ? > > -Alexandre Hi Alexandre, I've included some seekwatcher results for rados bench tests using 16 concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the other precise (ie no syncfs support vs syncfs support in libc). Unfortunately the original test was on 0.46 and the second test was on 0.47.2, so multiple things changed between the tests. Both were tested with kernel 3.4. Interestingly the seeks/second don't seem to drop much but the overall performance has about doubled. This was using a single 7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the journal in both cases. I'd definitely try 0.47.2 with a new libc though and see how that works for you. ceph 0.46/oneiric: http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg ceph 0.47.2/precise: http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg Mark -- -- Alexandre D erumier Ingénieur Système Fixe : 03 20 68 88 90 Fax : 03 20 68 90 81 45 Bvd du Général Leclerc 59100 Roubaix - France 12 rue Marivaux 75002 Paris - France -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-06 16:05 ` Alexandre DERUMIER @ 2012-06-06 16:43 ` Mark Nelson 0 siblings, 0 replies; 36+ messages in thread From: Mark Nelson @ 2012-06-06 16:43 UTC (permalink / raw) To: Alexandre DERUMIER; +Cc: Amon Ott, ceph-devel, Yann Dupont Hi Alexandre, If you can run blktrace during your test on one of the OSD data disks and send me the results I can take a look at them. Also, the rados bench settings and output would be useful too. Thanks, Mark On 6/6/12 11:05 AM, Alexandre DERUMIER wrote: > Hi, I have rebuild my cluster with ubuntu precise, > > -kernel 3.2 > -ceph 0.47.2 > -libc6 2.15 > -3 nodes - 5 osd (xfs) by node and 1 tmpfs with 5 journal file. > > I had launch rados bench, > and I see again constant writes to xfs.... > > Maybe this is related to tmpfs ? > > > I'll retry with kernel 3.4 from intank tomorrow. > I'll also try with journal on a physical disk with xfs partition. > > I'll keep you in touch. > > > ----- Mail original ----- > > De: "Mark Nelson"<mark.nelson@inktank.com> > À: "Alexandre DERUMIER"<aderumier@odiso.com> > Cc: "Amon Ott"<a.ott@m-privacy.de>, ceph-devel@vger.kernel.org, "Yann Dupont"<Yann.Dupont@univ-nantes.fr> > Envoyé: Lundi 4 Juin 2012 14:59:58 > Objet: Re: Infiniband 40GB > > On 6/4/12 6:40 AM, Alexandre DERUMIER wrote: >> Hi, >> >> I'm currently doing some tests with xfs, debian wheezy with standard libc6 (2.11.3-3) and 3.2 kernel. >> >> I'm doing some iostats(3 nodes with 5 osd), and I see constant writes to disks.(as the datas are flushed each second from journal to disk). >> >> Journal is big enough (20GB tmpfs) to handle 30s of write. >> >> Do you think it's related to the missing syncfs() support ? >> >> -Alexandre > > Hi Alexandre, > > I've included some seekwatcher results for rados bench tests using 16 > concurrent 4MB writes on XFS OSD. One shows ubuntu oneiric and the > other precise (ie no syncfs support vs syncfs support in libc). > Unfortunately the original test was on 0.46 and the second test was on > 0.47.2, so multiple things changed between the tests. Both were tested > with kernel 3.4. Interestingly the seeks/second don't seem to drop much > but the overall performance has about doubled. This was using a single > 7200rpm disk for the OSD data disk and a seperate 7200rpm disk for the > journal in both cases. I'd definitely try 0.47.2 with a new libc though > and see how that works for you. > > ceph 0.46/oneiric: > http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-oneiric-3.4.mpg > > ceph 0.47.2/precise: > http://nhm.ceph.com/movies/mailinglist-tests/xfs-osd0-precise-3.4.mpg > > Mark > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 9:47 ` Amon Ott 2012-06-04 9:58 ` Yann Dupont 2012-06-04 11:40 ` Alexandre DERUMIER @ 2012-06-04 15:42 ` Stefan Priebe 2012-06-05 7:08 ` Amon Ott 2012-06-06 10:48 ` Stefan Priebe - Profihost AG 3 siblings, 1 reply; 36+ messages in thread From: Stefan Priebe @ 2012-06-04 15:42 UTC (permalink / raw) To: Amon Ott; +Cc: Yann Dupont, ceph-devel Hi Amon, thanks for your backported patch. At least it doesn't cleanly apply to debian squeeze stable as it wants a glic 2.12 in Versions.def but Debian is only at 2.11? Do you use another patch too? Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 15:42 ` Stefan Priebe @ 2012-06-05 7:08 ` Amon Ott 2012-06-05 7:46 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 36+ messages in thread From: Amon Ott @ 2012-06-05 7:08 UTC (permalink / raw) To: Stefan Priebe; +Cc: Yann Dupont, ceph-devel On Monday 04 June 2012 wrote Stefan Priebe: > Hi Amon, > > thanks for your backported patch. At least it doesn't cleanly apply to > debian squeeze stable as it wants a glic 2.12 in Versions.def but Debian > is only at 2.11? Do you use another patch too? I ripped the patch right out of our previously built 2.11.3-3 source tree. It needs to be last in the series file, because several existing Debian patches modify the sources at various places. I could also make our compiled packages available to you for download. Amon Ott -- Dr. Amon Ott m-privacy GmbH Tel: +49 30 24342334 Am Köllnischen Park 1 Fax: +49 30 24342336 10179 Berlin http://www.m-privacy.de Amtsgericht Charlottenburg, HRB 84946 Geschäftsführer: Dipl.-Kfm. Holger Maczkowsky, Roman Maczkowsky GnuPG-Key-ID: 0x2DD3A649 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-05 7:08 ` Amon Ott @ 2012-06-05 7:46 ` Stefan Priebe - Profihost AG 0 siblings, 0 replies; 36+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-06-05 7:46 UTC (permalink / raw) To: Amon Ott; +Cc: Yann Dupont, ceph-devel Am 05.06.2012 09:08, schrieb Amon Ott: > On Monday 04 June 2012 wrote Stefan Priebe: >> Hi Amon, >> >> thanks for your backported patch. At least it doesn't cleanly apply to >> debian squeeze stable as it wants a glic 2.12 in Versions.def but Debian >> is only at 2.11? Do you use another patch too? > > I ripped the patch right out of our previously built 2.11.3-3 source tree. It > needs to be last in the series file, because several existing Debian patches > modify the sources at various places. I could also make our compiled packages > available to you for download. Sorry i added the file in front of the series file... Thanks Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 9:47 ` Amon Ott ` (2 preceding siblings ...) 2012-06-04 15:42 ` Stefan Priebe @ 2012-06-06 10:48 ` Stefan Priebe - Profihost AG 2012-06-06 10:57 ` Amon Ott 3 siblings, 1 reply; 36+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-06-06 10:48 UTC (permalink / raw) To: Amon Ott; +Cc: Yann Dupont, ceph-devel Hi Amon, i've added your patch: # strings /lib/libc-2.11.3.so |grep -i syncfs syncfs But configure of ceph still claims there is no syncfs support. # ./configure |grep -i sync checking for syncfs... no checking for sync_file_range... yes Any ideas? Hint: I'm compiling my packages on an OpenVZ RHEL6 based virtual container - so THIS kernel where i'm compiling does not support syncfs is this the reason? Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-06 10:48 ` Stefan Priebe - Profihost AG @ 2012-06-06 10:57 ` Amon Ott 2012-06-06 11:02 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 36+ messages in thread From: Amon Ott @ 2012-06-06 10:57 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: ceph-devel On Wednesday 06 June 2012 wrote Stefan Priebe - Profihost AG: > Hi Amon, > > i've added your patch: > # strings /lib/libc-2.11.3.so |grep -i syncfs > syncfs > > But configure of ceph still claims there is no syncfs support. > > # ./configure |grep -i sync > checking for syncfs... no > checking for sync_file_range... yes > > Any ideas? Did you also install the new libc6-dev, which contains the new header files? Amon Ott -- Dr. Amon Ott m-privacy GmbH Tel: +49 30 24342334 Am Köllnischen Park 1 Fax: +49 30 24342336 10179 Berlin http://www.m-privacy.de Amtsgericht Charlottenburg, HRB 84946 Geschäftsführer: Dipl.-Kfm. Holger Maczkowsky, Roman Maczkowsky GnuPG-Key-ID: 0x2DD3A649 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-06 10:57 ` Amon Ott @ 2012-06-06 11:02 ` Stefan Priebe - Profihost AG 2012-06-07 11:33 ` Amon Ott 0 siblings, 1 reply; 36+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-06-06 11:02 UTC (permalink / raw) To: Amon Ott; +Cc: ceph-devel Am 06.06.2012 12:57, schrieb Amon Ott: > On Wednesday 06 June 2012 wrote Stefan Priebe - Profihost AG: >> Hi Amon, >> >> i've added your patch: >> # strings /lib/libc-2.11.3.so |grep -i syncfs >> syncfs >> >> But configure of ceph still claims there is no syncfs support. >> >> # ./configure |grep -i sync >> checking for syncfs... no >> checking for sync_file_range... yes >> >> Any ideas? > > Did you also install the new libc6-dev, which contains the new header files? Yes. /usr/include/unistd.h: extern int syncfs (int __fd) __THROW; /usr/include/gnu/stubs-64.h: #define __stub_syncfs Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-06 11:02 ` Stefan Priebe - Profihost AG @ 2012-06-07 11:33 ` Amon Ott 2012-06-07 12:44 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 36+ messages in thread From: Amon Ott @ 2012-06-07 11:33 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: ceph-devel On Wednesday 06 June 2012 wrote Stefan Priebe - Profihost AG: > Am 06.06.2012 12:57, schrieb Amon Ott: > > On Wednesday 06 June 2012 wrote Stefan Priebe - Profihost AG: > >> Hi Amon, > >> > >> i've added your patch: > >> # strings /lib/libc-2.11.3.so |grep -i syncfs > >> syncfs > >> > >> But configure of ceph still claims there is no syncfs support. > >> > >> # ./configure |grep -i sync > >> checking for syncfs... no > >> checking for sync_file_range... yes > >> > >> Any ideas? > > > > Did you also install the new libc6-dev, which contains the new header > > files? > > Yes. > > /usr/include/unistd.h: > extern int syncfs (int __fd) __THROW; > > /usr/include/gnu/stubs-64.h: > #define __stub_syncfs Are you building on 32 or 64 Bit? We have 32 here. Amon Ott -- Dr. Amon Ott m-privacy GmbH Tel: +49 30 24342334 Am Köllnischen Park 1 Fax: +49 30 24342336 10179 Berlin http://www.m-privacy.de Amtsgericht Charlottenburg, HRB 84946 Geschäftsführer: Dipl.-Kfm. Holger Maczkowsky, Roman Maczkowsky GnuPG-Key-ID: 0x2DD3A649 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-07 11:33 ` Amon Ott @ 2012-06-07 12:44 ` Stefan Priebe - Profihost AG 0 siblings, 0 replies; 36+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-06-07 12:44 UTC (permalink / raw) To: Amon Ott; +Cc: ceph-devel Am 07.06.2012 13:33, schrieb Amon Ott: > On Wednesday 06 June 2012 wrote Stefan Priebe - Profihost AG: >> Am 06.06.2012 12:57, schrieb Amon Ott: >>> On Wednesday 06 June 2012 wrote Stefan Priebe - Profihost AG: >> /usr/include/unistd.h: >> extern int syncfs (int __fd) __THROW; >> >> /usr/include/gnu/stubs-64.h: >> #define __stub_syncfs > > Are you building on 32 or 64 Bit? We have 32 here. 64bit but does this make a difference? Stefan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 8:23 ` Stefan Majer 2012-06-04 9:21 ` Yann Dupont @ 2012-06-05 8:54 ` Stefan Priebe - Profihost AG 1 sibling, 0 replies; 36+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-06-05 8:54 UTC (permalink / raw) To: Stefan Majer; +Cc: Hannes Reinecke, Mark Nelson, ceph-devel@vger.kernel.org Hi Stefan, Am 04.06.2012 10:23, schrieb Stefan Majer: > our production environment is running on 10GB infrastructure. We had a > lot of troubles till we got to where we are today. > We use Intel X520 D2 cards on our OSD´s and nexus switch > infrastructure. All other cards we where testing failed horrible. Have you also tried emulex cards? (also used by HP) Stefan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 6:22 ` Hannes Reinecke 2012-06-04 7:26 ` Stefan Priebe - Profihost AG @ 2012-06-04 12:28 ` Mark Nelson 2012-06-04 12:34 ` Tomasz Paszkowski 1 sibling, 1 reply; 36+ messages in thread From: Mark Nelson @ 2012-06-04 12:28 UTC (permalink / raw) To: Hannes Reinecke; +Cc: Stefan Priebe, ceph-devel@vger.kernel.org On 6/4/12 1:22 AM, Hannes Reinecke wrote: > On 06/03/2012 02:56 PM, Mark Nelson wrote: >> On 6/3/12 3:10 AM, Stefan Priebe wrote: >>> Hi List, >>> >>> has anybody already tried CEPH over Infiniband 40GB? >>> >>> Stefan >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> Hi Stefan, >> >> A couple of folks have done DDR IB. For now you are limited to >> ipoib though. If you have the hardware available I'd be really >> curious what kind of throughput/latencies you see. >> > Hehe. > > Good luck with that. > > We've tried on 10GigE with _disastrous_ results. > Up to the point where 1GigE was actually _faster_. Strange! Do you see good results with something like iperf? Internally we have 10GE on some of our test nodes and I can get up to around 600MB/s per node during rados bench testing. > So far we've uncovered two issues: > - intel_idle was/is seriously broken (we've tried on 3.0-stable, > so might've been fixed by now) > - osd-server is calling 'fsync' on each and every write request. > Does wonders for performance ... For syncfs support, upgrade to a distro with glibc 2.13+ (ie precise). I've noticed a significant improvement in our spinning disk performance going from oneiric and kernel 3.3 to precise and kernel 3.4. I think part of this is related to the raid drivers for the cards we have in our test boxes though. I'm actually recording blktrace and seekwatcher results for all of our tests to specifically look at syncs and disk seek behavior... > > Cheers, > > Hannes Mark ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 12:28 ` Mark Nelson @ 2012-06-04 12:34 ` Tomasz Paszkowski 2012-06-04 12:40 ` Mark Nelson 0 siblings, 1 reply; 36+ messages in thread From: Tomasz Paszkowski @ 2012-06-04 12:34 UTC (permalink / raw) To: Mark Nelson; +Cc: Hannes Reinecke, Stefan Priebe, ceph-devel@vger.kernel.org On Mon, Jun 4, 2012 at 2:28 PM, Mark Nelson <mark.nelson@inktank.com> wrote: > > For syncfs support, upgrade to a distro with glibc 2.13+ (ie precise). I've > noticed a significant improvement in our spinning disk performance going > from oneiric and kernel 3.3 to precise and kernel 3.4. I think part of this > is related to the raid drivers for the cards we have in our test boxes > though. I'm actually recording blktrace and seekwatcher results for all of > our tests to specifically look at syncs and disk seek behavior... > Correct me if I'am wrong. But AFAIR precise in running 3.2 kernel. -- Tomasz Paszkowski SS7, Asterisk, SAN, Datacenter, Cloud Computing +48500166299 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Infiniband 40GB 2012-06-04 12:34 ` Tomasz Paszkowski @ 2012-06-04 12:40 ` Mark Nelson 0 siblings, 0 replies; 36+ messages in thread From: Mark Nelson @ 2012-06-04 12:40 UTC (permalink / raw) To: Tomasz Paszkowski Cc: Hannes Reinecke, Stefan Priebe, ceph-devel@vger.kernel.org On 6/4/12 7:34 AM, Tomasz Paszkowski wrote: > On Mon, Jun 4, 2012 at 2:28 PM, Mark Nelson<mark.nelson@inktank.com> wrote: >> >> For syncfs support, upgrade to a distro with glibc 2.13+ (ie precise). I've >> noticed a significant improvement in our spinning disk performance going >> from oneiric and kernel 3.3 to precise and kernel 3.4. I think part of this >> is related to the raid drivers for the cards we have in our test boxes >> though. I'm actually recording blktrace and seekwatcher results for all of >> our tests to specifically look at syncs and disk seek behavior... >> > > Correct me if I'am wrong. But AFAIR precise in running 3.2 kernel. Sorry, I should have been more clear. We were running oneiric with our own kernel 3.3 build and are now running precise with our own kernel 3.4 build (available on gitbuilder.ceph.com). Mark ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2012-06-07 17:15 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <a81f3855-1c7d-447b-9bbf-6a891e372909@mailpro>
2012-06-07 3:31 ` Infiniband 40GB Alexandre DERUMIER
2012-06-07 11:25 ` Alexandre DERUMIER
2012-06-07 17:15 ` Mark Nelson
2012-06-03 8:10 Stefan Priebe
2012-06-03 12:56 ` Mark Nelson
2012-06-04 6:22 ` Hannes Reinecke
2012-06-04 7:26 ` Stefan Priebe - Profihost AG
2012-06-04 7:39 ` Hannes Reinecke
2012-06-04 7:53 ` Stefan Priebe - Profihost AG
2012-06-04 8:02 ` Hannes Reinecke
2012-06-04 8:23 ` Stefan Majer
2012-06-04 9:21 ` Yann Dupont
2012-06-04 9:35 ` Alexandre DERUMIER
2012-06-04 9:53 ` Yann Dupont
2012-06-04 9:47 ` Amon Ott
2012-06-04 9:58 ` Yann Dupont
2012-06-04 11:40 ` Alexandre DERUMIER
2012-06-04 12:59 ` Mark Nelson
2012-06-04 13:07 ` Alexandre DERUMIER
2012-06-04 13:28 ` Mark Nelson
2012-06-04 15:11 ` Gregory Farnum
2012-06-04 15:34 ` Mark Nelson
2012-06-06 16:05 ` Alexandre DERUMIER
2012-06-06 16:43 ` Mark Nelson
2012-06-04 15:42 ` Stefan Priebe
2012-06-05 7:08 ` Amon Ott
2012-06-05 7:46 ` Stefan Priebe - Profihost AG
2012-06-06 10:48 ` Stefan Priebe - Profihost AG
2012-06-06 10:57 ` Amon Ott
2012-06-06 11:02 ` Stefan Priebe - Profihost AG
2012-06-07 11:33 ` Amon Ott
2012-06-07 12:44 ` Stefan Priebe - Profihost AG
2012-06-05 8:54 ` Stefan Priebe - Profihost AG
2012-06-04 12:28 ` Mark Nelson
2012-06-04 12:34 ` Tomasz Paszkowski
2012-06-04 12:40 ` Mark Nelson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.