* Re: [ceph-commit] Ceph Zfs [not found] <441865dbf9e127f0b85193c512878a76@iihtcloudsolutions.com> @ 2012-10-25 15:36 ` Sage Weil 2012-10-26 4:46 ` Raghunandhan 2012-10-26 5:32 ` Raghunandhan 0 siblings, 2 replies; 8+ messages in thread From: Sage Weil @ 2012-10-25 15:36 UTC (permalink / raw) To: Raghunandhan; +Cc: ceph-devel [moved to ceph-devel] On Thu, 25 Oct 2012, Raghunandhan wrote: > Hi All, > > I have been working around ceph quite a long and trying to stitch zfs with > ceph. I was able to do it to certain extent as follows: > 1. zpool creation > 2. set dedup > 3. create a mountable volume of zfs (zfs create) > 4. format the volume with ext4 and enabling xattr > 5. mkcephfs on the volume > > This actually works and dedup is perfect. But i need to avoid multiple layers > on the storage since the performance is very slow and the kernel timeout > occurs often for a 8GB RAM. I want to test the performance between btrfs and > zfs. I want to avoid the above multiple layering on storage and make the ceph > cluster aware of zfs. Let me know if anyone has workaround this. I'm not familiar enough with zfs to know what 'mountable volume' means.. is that a block device/lun that you're putting ext4 on? Probably the best results will come from creating a zfs *file system* (using the ZPL or whatever it is) and running ceph-osd on top of that. There is at least one open bug from someone having problems there, but we'd very much like to sort out the problem. sage ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [ceph-commit] Ceph Zfs 2012-10-25 15:36 ` [ceph-commit] Ceph Zfs Sage Weil @ 2012-10-26 4:46 ` Raghunandhan 2012-10-26 19:38 ` Dan Mick 2012-10-26 5:32 ` Raghunandhan 1 sibling, 1 reply; 8+ messages in thread From: Raghunandhan @ 2012-10-26 4:46 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Hi Sage, Thanks for replying back, Once a zpool is created if i mount it on /var/lib/ceph/osd/ceph-0 the cephfs doesnt recognize it as a superblock and hence it fails, Im trying to build this on our cloud storage since btrfs has not been stable nor they have come up with online dedup i have no other choice for now to work with zfs ceph which makes sense. So what i exactly did was created a zpool store 1 Then used the same store and made a block device from it using zfs create 2 Once the zfs create was successful i was able to format with ext4 using xattr 3 On top of it was the ceph Following this process doesnt make sense because of multiple layer on the storage and the ceph consumes a lot of RAM and cpu cycles which ends up in kernel hung task. It would be great if there is a way i could directly use the zfs pool with ceph and make it work. --- Regards, Raghunandhan.G IIHT Cloud Solutions Pvt. Ltd. #15, 4th Floor, 'A' Wing, Sri Lakshmi Complex, St. Marks Road, Bangalore - 560 001, India On 25-10-2012 22:06, Sage Weil wrote: > [moved to ceph-devel] > > On Thu, 25 Oct 2012, Raghunandhan wrote: >> Hi All, >> >> I have been working around ceph quite a long and trying to stitch >> zfs with >> ceph. I was able to do it to certain extent as follows: >> 1. zpool creation >> 2. set dedup >> 3. create a mountable volume of zfs (zfs create) >> 4. format the volume with ext4 and enabling xattr >> 5. mkcephfs on the volume >> >> This actually works and dedup is perfect. But i need to avoid >> multiple layers >> on the storage since the performance is very slow and the kernel >> timeout >> occurs often for a 8GB RAM. I want to test the performance between >> btrfs and >> zfs. I want to avoid the above multiple layering on storage and make >> the ceph >> cluster aware of zfs. Let me know if anyone has workaround this. > > I'm not familiar enough with zfs to know what 'mountable volume' > means.. > is that a block device/lun that you're putting ext4 on? Probably the > best > results will come from creating a zfs *file system* (using the ZPL or > whatever it is) and running ceph-osd on top of that. > > There is at least one open bug from someone having problems there, > but > we'd very much like to sort out the problem. > > sage ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [ceph-commit] Ceph Zfs 2012-10-26 4:46 ` Raghunandhan @ 2012-10-26 19:38 ` Dan Mick 2012-10-27 5:14 ` Raghunandhan 2012-10-27 5:50 ` Raghunandhan 0 siblings, 2 replies; 8+ messages in thread From: Dan Mick @ 2012-10-26 19:38 UTC (permalink / raw) To: Raghunandhan; +Cc: Sage Weil, ceph-devel On 10/25/2012 09:46 PM, Raghunandhan wrote: > Hi Sage, > > Thanks for replying back, Once a zpool is created if i mount it on > /var/lib/ceph/osd/ceph-0 the cephfs doesnt recognize it as a superblock > and hence it fails, I assume you mean "once a zfs is created"? One can't mount zpools, can one? > Im trying to build this on our cloud storage since > btrfs has not been stable nor they have come up with online dedup i have > no other choice for now to work with zfs ceph which makes sense. > > So what i exactly did was created a zpool store > 1 Then used the same store and made a block device from it using zfs create > 2 Once the zfs create was successful i was able to format with ext4 > using xattr > 3 On top of it was the ceph > > Following this process doesnt make sense because of multiple layer on > the storage and the ceph consumes a lot of RAM and cpu cycles which ends > up in kernel hung task. It would be great if there is a way i could > directly use the zfs pool with ceph and make it work. Have you actually tried making a zfs filesystem in the zpool, and using that as backing store for the osd? > > --- > Regards, > Raghunandhan.G > IIHT Cloud Solutions Pvt. Ltd. > #15, 4th Floor, 'A' Wing, Sri Lakshmi Complex, > St. Marks Road, Bangalore - 560 001, India > > On 25-10-2012 22:06, Sage Weil wrote: >> [moved to ceph-devel] >> >> On Thu, 25 Oct 2012, Raghunandhan wrote: >>> Hi All, >>> >>> I have been working around ceph quite a long and trying to stitch zfs >>> with >>> ceph. I was able to do it to certain extent as follows: >>> 1. zpool creation >>> 2. set dedup >>> 3. create a mountable volume of zfs (zfs create) >>> 4. format the volume with ext4 and enabling xattr >>> 5. mkcephfs on the volume >>> >>> This actually works and dedup is perfect. But i need to avoid >>> multiple layers >>> on the storage since the performance is very slow and the kernel timeout >>> occurs often for a 8GB RAM. I want to test the performance between >>> btrfs and >>> zfs. I want to avoid the above multiple layering on storage and make >>> the ceph >>> cluster aware of zfs. Let me know if anyone has workaround this. >> >> I'm not familiar enough with zfs to know what 'mountable volume' means.. >> is that a block device/lun that you're putting ext4 on? Probably the >> best >> results will come from creating a zfs *file system* (using the ZPL or >> whatever it is) and running ceph-osd on top of that. >> >> There is at least one open bug from someone having problems there, but >> we'd very much like to sort out the problem. >> >> sage > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [ceph-commit] Ceph Zfs 2012-10-26 19:38 ` Dan Mick @ 2012-10-27 5:14 ` Raghunandhan 2012-10-27 17:15 ` Sage Weil 2012-10-27 5:50 ` Raghunandhan 1 sibling, 1 reply; 8+ messages in thread From: Raghunandhan @ 2012-10-27 5:14 UTC (permalink / raw) To: Dan Mick; +Cc: Sage Weil, ceph-devel Hi Dan, Yes once a zpool is created there is a way we can use the zpool and make a partition out of it using "zfs create -V". The newly created partition will be available on fdisk. Later the same partition can be formatted with ext4 and used with ceph-osd. I have also tried using a zfs filesystem in the zpool and mapped it with osd. When i run mkcephfs i get "error creating empty object store /osd.0: (22) invalid argument == osd.0 === 2012-10-27 10:40:33.939961 7f6e6165d780 -1 filestore(/osd.0) mkjournal error creating journal on /osd.0/journal: (22) Invalid argument 2012-10-27 10:40:33.939981 7f6e6165d780 -1 OSD::mkfs: FileStore::mkfs failed with error -22 2012-10-27 10:40:33.940036 7f6e6165d780 -1 ** ERROR: error creating empty object store in /osd.0: (22) Invalid argument failed: '/sbin/mkcephfs -d /tmp/mkcephfs.3zqOx7Btvl --init-daemon osd.0' --- Regards, Raghunandhan.G IIHT Cloud Solutions Pvt. Ltd. #15, 4th Floor, 'A' Wing, Sri Lakshmi Complex, St. Marks Road, Bangalore - 560 001, India On 27-10-2012 02:08, Dan Mick wrote: > On 10/25/2012 09:46 PM, Raghunandhan wrote: >> Hi Sage, >> >> Thanks for replying back, Once a zpool is created if i mount it on >> /var/lib/ceph/osd/ceph-0 the cephfs doesnt recognize it as a >> superblock >> and hence it fails, > > I assume you mean "once a zfs is created"? One can't mount zpools, > can one? > >> Im trying to build this on our cloud storage since >> btrfs has not been stable nor they have come up with online dedup i >> have >> no other choice for now to work with zfs ceph which makes sense. >> >> So what i exactly did was created a zpool store >> 1 Then used the same store and made a block device from it using zfs >> create >> 2 Once the zfs create was successful i was able to format with ext4 >> using xattr >> 3 On top of it was the ceph >> >> Following this process doesnt make sense because of multiple layer >> on >> the storage and the ceph consumes a lot of RAM and cpu cycles which >> ends >> up in kernel hung task. It would be great if there is a way i could >> directly use the zfs pool with ceph and make it work. > > Have you actually tried making a zfs filesystem in the zpool, and > using that as backing store for the osd? > >> >> --- >> Regards, >> Raghunandhan.G >> IIHT Cloud Solutions Pvt. Ltd. >> #15, 4th Floor, 'A' Wing, Sri Lakshmi Complex, >> St. Marks Road, Bangalore - 560 001, India >> >> On 25-10-2012 22:06, Sage Weil wrote: >>> [moved to ceph-devel] >>> >>> On Thu, 25 Oct 2012, Raghunandhan wrote: >>>> Hi All, >>>> >>>> I have been working around ceph quite a long and trying to stitch >>>> zfs >>>> with >>>> ceph. I was able to do it to certain extent as follows: >>>> 1. zpool creation >>>> 2. set dedup >>>> 3. create a mountable volume of zfs (zfs create) >>>> 4. format the volume with ext4 and enabling xattr >>>> 5. mkcephfs on the volume >>>> >>>> This actually works and dedup is perfect. But i need to avoid >>>> multiple layers >>>> on the storage since the performance is very slow and the kernel >>>> timeout >>>> occurs often for a 8GB RAM. I want to test the performance between >>>> btrfs and >>>> zfs. I want to avoid the above multiple layering on storage and >>>> make >>>> the ceph >>>> cluster aware of zfs. Let me know if anyone has workaround this. >>> >>> I'm not familiar enough with zfs to know what 'mountable volume' >>> means.. >>> is that a block device/lun that you're putting ext4 on? Probably >>> the >>> best >>> results will come from creating a zfs *file system* (using the ZPL >>> or >>> whatever it is) and running ceph-osd on top of that. >>> >>> There is at least one open bug from someone having problems there, >>> but >>> we'd very much like to sort out the problem. >>> >>> sage >> >> -- >> To unsubscribe from this list: send the line "unsubscribe >> ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [ceph-commit] Ceph Zfs 2012-10-27 5:14 ` Raghunandhan @ 2012-10-27 17:15 ` Sage Weil 2012-10-28 5:19 ` Raghunandhan 0 siblings, 1 reply; 8+ messages in thread From: Sage Weil @ 2012-10-27 17:15 UTC (permalink / raw) To: Raghunandhan; +Cc: Dan Mick, ceph-devel On Sat, 27 Oct 2012, Raghunandhan wrote: > Hi Dan, > > Yes once a zpool is created there is a way we can use the zpool and make a > partition out of it using "zfs create -V". The newly created partition will be > available on fdisk. Later the same partition can be formatted with ext4 and > used with ceph-osd. > > I have also tried using a zfs filesystem in the zpool and mapped it with osd. > When i run mkcephfs i get "error creating empty object store /osd.0: (22) > invalid argument > > == osd.0 === > 2012-10-27 10:40:33.939961 7f6e6165d780 -1 filestore(/osd.0) mkjournal error > creating journal on /osd.0/journal: (22) Invalid argument > 2012-10-27 10:40:33.939981 7f6e6165d780 -1 OSD::mkfs: FileStore::mkfs failed > with error -22 > 2012-10-27 10:40:33.940036 7f6e6165d780 -1 ** ERROR: error creating empty > object store in /osd.0: (22) Invalid argument > failed: '/sbin/mkcephfs -d /tmp/mkcephfs.3zqOx7Btvl --init-daemon osd.0' Can you generate a log with 'debug filestore = 20' of this happening so we can see exactly which operation is failing with -EINVAL? There is probably some ioctl or syscall that is going awry. Thanks! sage > > --- > Regards, > Raghunandhan.G > IIHT Cloud Solutions Pvt. Ltd. > #15, 4th Floor, 'A' Wing, Sri Lakshmi Complex, > St. Marks Road, Bangalore - 560 001, India > > On 27-10-2012 02:08, Dan Mick wrote: > > On 10/25/2012 09:46 PM, Raghunandhan wrote: > > > Hi Sage, > > > > > > Thanks for replying back, Once a zpool is created if i mount it on > > > /var/lib/ceph/osd/ceph-0 the cephfs doesnt recognize it as a superblock > > > and hence it fails, > > > > I assume you mean "once a zfs is created"? One can't mount zpools, can one? > > > > > Im trying to build this on our cloud storage since > > > btrfs has not been stable nor they have come up with online dedup i have > > > no other choice for now to work with zfs ceph which makes sense. > > > > > > So what i exactly did was created a zpool store > > > 1 Then used the same store and made a block device from it using zfs > > > create > > > 2 Once the zfs create was successful i was able to format with ext4 > > > using xattr > > > 3 On top of it was the ceph > > > > > > Following this process doesnt make sense because of multiple layer on > > > the storage and the ceph consumes a lot of RAM and cpu cycles which ends > > > up in kernel hung task. It would be great if there is a way i could > > > directly use the zfs pool with ceph and make it work. > > > > Have you actually tried making a zfs filesystem in the zpool, and > > using that as backing store for the osd? > > > > > > > > --- > > > Regards, > > > Raghunandhan.G > > > IIHT Cloud Solutions Pvt. Ltd. > > > #15, 4th Floor, 'A' Wing, Sri Lakshmi Complex, > > > St. Marks Road, Bangalore - 560 001, India > > > > > > On 25-10-2012 22:06, Sage Weil wrote: > > > > [moved to ceph-devel] > > > > > > > > On Thu, 25 Oct 2012, Raghunandhan wrote: > > > > > Hi All, > > > > > > > > > > I have been working around ceph quite a long and trying to stitch zfs > > > > > with > > > > > ceph. I was able to do it to certain extent as follows: > > > > > 1. zpool creation > > > > > 2. set dedup > > > > > 3. create a mountable volume of zfs (zfs create) > > > > > 4. format the volume with ext4 and enabling xattr > > > > > 5. mkcephfs on the volume > > > > > > > > > > This actually works and dedup is perfect. But i need to avoid > > > > > multiple layers > > > > > on the storage since the performance is very slow and the kernel > > > > > timeout > > > > > occurs often for a 8GB RAM. I want to test the performance between > > > > > btrfs and > > > > > zfs. I want to avoid the above multiple layering on storage and make > > > > > the ceph > > > > > cluster aware of zfs. Let me know if anyone has workaround this. > > > > > > > > I'm not familiar enough with zfs to know what 'mountable volume' means.. > > > > is that a block device/lun that you're putting ext4 on? Probably the > > > > best > > > > results will come from creating a zfs *file system* (using the ZPL or > > > > whatever it is) and running ceph-osd on top of that. > > > > > > > > There is at least one open bug from someone having problems there, but > > > > we'd very much like to sort out the problem. > > > > > > > > sage > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [ceph-commit] Ceph Zfs 2012-10-27 17:15 ` Sage Weil @ 2012-10-28 5:19 ` Raghunandhan 0 siblings, 0 replies; 8+ messages in thread From: Raghunandhan @ 2012-10-28 5:19 UTC (permalink / raw) To: Sage Weil, Dan Mick; +Cc: ceph-devel On 27-10-2012 23:45, Sage Weil wrote: > On Sat, 27 Oct 2012, Raghunandhan wrote: >> Hi Dan, >> >> Yes once a zpool is created there is a way we can use the zpool and >> make a >> partition out of it using "zfs create -V". The newly created >> partition will be >> available on fdisk. Later the same partition can be formatted with >> ext4 and >> used with ceph-osd. >> >> I have also tried using a zfs filesystem in the zpool and mapped it >> with osd. >> When i run mkcephfs i get "error creating empty object store /osd.0: >> (22) >> invalid argument >> >> == osd.0 === >> 2012-10-27 10:40:33.939961 7f6e6165d780 -1 filestore(/osd.0) >> mkjournal error >> creating journal on /osd.0/journal: (22) Invalid argument >> 2012-10-27 10:40:33.939981 7f6e6165d780 -1 OSD::mkfs: >> FileStore::mkfs failed >> with error -22 >> 2012-10-27 10:40:33.940036 7f6e6165d780 -1 ** ERROR: error creating >> empty >> object store in /osd.0: (22) Invalid argument >> failed: '/sbin/mkcephfs -d /tmp/mkcephfs.3zqOx7Btvl --init-daemon >> osd.0' > > Can you generate a log with 'debug filestore = 20' of this happening > so we > can see exactly which operation is failing with -EINVAL? There is > probably some ioctl or syscall that is going awry. > > Thanks! > sage Above issue was rectified with journal dio=false in ceph.conf ceph status when used with zfs filesystem OSD dies on one node but its still up on other node. # ceph -s health HEALTH_WARN 407 pgs degraded; 169 pgs down; 169 pgs peering; 15 pgs recovering; 323 pgs stuck unclean; recovery 38/42 degraded (90.476%); 19/21 unfound (90.476%); 1/2 in osds are down monmap e1: 2 mons at {a=11.0.0.2:6789/0,b=11.0.0.3:6789/0}, election epoch 4, quorum 0,1 a,b osdmap e7: 2 osds: 1 up, 2 in pgmap v10: 576 pgs: 15 active+recovering+degraded, 169 down+peering, 392 active+degraded; 8059 bytes data, 1004 MB used, 85683 MB / 86687 MB avail; 38/42 degraded (90.476%); 19/21 unfound (90.476%) mdsmap e5: 1/1/1 up {0=a=up:active}, 1 up:standby Log file generated when 2 osd's where up and later it went down. 2012-10-27 11:14:07.152741 mon.0 11.0.0.2:6789/0 27 : [INF] osdmap e5: 2 osds: 2 up, 2 in 2012-10-27 11:14:07.192719 mon.0 11.0.0.2:6789/0 28 : [INF] pgmap v6: 576 pgs: 576 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB avail 2012-10-27 11:14:12.007671 mon.0 11.0.0.2:6789/0 29 : [INF] pgmap v7: 576 pgs: 272 creating, 43 active, 253 active+clean, 8 active+recovering; 1243 bytes data, 1003 MB used, 85684 MB / 86687 MB avail; 9/18 degraded (50.000%) 2012-10-27 11:14:32.014302 mon.0 11.0.0.2:6789/0 30 : [DBG] osd.0 11.0.0.2:6801/24250 reported failed by osd.1 11.0.0.3:6801/8443 2012-10-27 11:14:37.033547 mon.0 11.0.0.2:6789/0 31 : [DBG] osd.0 11.0.0.2:6801/24250 reported failed by osd.1 11.0.0.3:6801/8443 2012-10-27 11:14:42.060678 mon.0 11.0.0.2:6789/0 32 : [DBG] osd.0 11.0.0.2:6801/24250 reported failed by osd.1 11.0.0.3:6801/8443 2012-10-27 11:14:42.060827 mon.0 11.0.0.2:6789/0 33 : [INF] osd.0 11.0.0.2:6801/24250 failed (3 reports from 1 peers after 30.046376 >= grace 20.000000) 2012-10-27 11:14:42.157536 mon.0 11.0.0.2:6789/0 34 : [INF] osdmap e6: 2 osds: 1 up, 2 in osd.0 dies after a while: 2012-10-27 11:19:46.751562 mon.0 11.0.0.2:6789/0 40 : [INF] osd.0 out (down for 304.604259) 2012-10-27 11:19:46.785574 mon.0 11.0.0.2:6789/0 41 : [INF] osdmap e8: 2 osds: 1 up, 1 in 2012-10-27 11:19:46.811588 mon.0 11.0.0.2:6789/0 42 : [INF] pgmap v12: 576 pgs: 15 active+recovering+degraded, 169 down+peering, 392 active+degraded; 8059 bytes data, 1004 MB used, 85683 MB / 86687 MB avail; 38/42 degraded (90.476%); 19/21 unfound (90.476%) 2012-10-27 11:19:49.591172 mon.0 11.0.0.2:6789/0 43 : [INF] pgmap v13: 576 pgs: 15 active+recovering+degraded, 169 down+peering, 392 active+degraded; 8059 bytes data, 1004 MB used, 85683 MB / 86687 MB avail; 38/42 degraded (90.476%); 19/21 unfound (90.476%) 2012-10-27 11:20:04.671337 mon.0 11.0.0.2:6789/0 44 : [INF] pgmap v14: 576 pgs: 15 active+recovering+degraded, 169 down+peering, 392 active+degraded; 8059 bytes data, 1004 MB used, 85683 MB / 86687 MB avail; 38/42 degraded (90.476%); 19/21 unfound (90.476%) status of osd.1 as of now: 2012-10-28 10:48:54.022338 osd.1 11.0.0.3:6801/8443 396978 : [WRN] slow request 84884.436995 seconds old, received at 2012-10-27 11:14:09.585282: osd_op(mds.0.1:28 200.00000001 [write 131~671] 1.6e5f474 RETRY) v4 currently delayed 2012-10-28 10:48:54.022343 osd.1 11.0.0.3:6801/8443 396979 : [WRN] slow request 84851.874118 seconds old, received at 2012-10-27 11:14:42.148159: osd_op(mds.0.1:29 200.00000000 [writefull 0~84] 1.844f3494 RETRY) v4 currently delayed 2012-10-28 10:48:54.022346 osd.1 11.0.0.3:6801/8443 396980 : [WRN] slow request 81939.241084 seconds old, received at 2012-10-27 12:03:14.781193: osd_op(mds.0.1:30 200.00000001 [write 802~183] 1.6e5f474) v4 currently delayed 2012-10-28 10:48:54.022350 osd.1 11.0.0.3:6801/8443 396981 : [WRN] slow request 81939.240915 seconds old, received at 2012-10-27 12:03:14.781362: osd_op(mds.0.1:31 200.00000000 [writefull 0~84] 1.844f3494) v4 currently delayed --- Regards, Raghunandhan.G >> >> --- >> Regards, >> Raghunandhan.G >> IIHT Cloud Solutions Pvt. Ltd. >> #15, 4th Floor, 'A' Wing, Sri Lakshmi Complex, >> St. Marks Road, Bangalore - 560 001, India >> >> On 27-10-2012 02:08, Dan Mick wrote: >> > On 10/25/2012 09:46 PM, Raghunandhan wrote: >> > > Hi Sage, >> > > >> > > Thanks for replying back, Once a zpool is created if i mount it >> on >> > > /var/lib/ceph/osd/ceph-0 the cephfs doesnt recognize it as a >> superblock >> > > and hence it fails, >> > >> > I assume you mean "once a zfs is created"? One can't mount >> zpools, can one? >> > >> > > Im trying to build this on our cloud storage since >> > > btrfs has not been stable nor they have come up with online >> dedup i have >> > > no other choice for now to work with zfs ceph which makes sense. >> > > >> > > So what i exactly did was created a zpool store >> > > 1 Then used the same store and made a block device from it using >> zfs >> > > create >> > > 2 Once the zfs create was successful i was able to format with >> ext4 >> > > using xattr >> > > 3 On top of it was the ceph >> > > >> > > Following this process doesnt make sense because of multiple >> layer on >> > > the storage and the ceph consumes a lot of RAM and cpu cycles >> which ends >> > > up in kernel hung task. It would be great if there is a way i >> could >> > > directly use the zfs pool with ceph and make it work. >> > >> > Have you actually tried making a zfs filesystem in the zpool, and >> > using that as backing store for the osd? >> > >> > > >> > > --- >> > > Regards, >> > > Raghunandhan.G >> > > IIHT Cloud Solutions Pvt. Ltd. >> > > #15, 4th Floor, 'A' Wing, Sri Lakshmi Complex, >> > > St. Marks Road, Bangalore - 560 001, India >> > > >> > > On 25-10-2012 22:06, Sage Weil wrote: >> > > > [moved to ceph-devel] >> > > > >> > > > On Thu, 25 Oct 2012, Raghunandhan wrote: >> > > > > Hi All, >> > > > > >> > > > > I have been working around ceph quite a long and trying to >> stitch zfs >> > > > > with >> > > > > ceph. I was able to do it to certain extent as follows: >> > > > > 1. zpool creation >> > > > > 2. set dedup >> > > > > 3. create a mountable volume of zfs (zfs create) >> > > > > 4. format the volume with ext4 and enabling xattr >> > > > > 5. mkcephfs on the volume >> > > > > >> > > > > This actually works and dedup is perfect. But i need to >> avoid >> > > > > multiple layers >> > > > > on the storage since the performance is very slow and the >> kernel >> > > > > timeout >> > > > > occurs often for a 8GB RAM. I want to test the performance >> between >> > > > > btrfs and >> > > > > zfs. I want to avoid the above multiple layering on storage >> and make >> > > > > the ceph >> > > > > cluster aware of zfs. Let me know if anyone has workaround >> this. >> > > > >> > > > I'm not familiar enough with zfs to know what 'mountable >> volume' means.. >> > > > is that a block device/lun that you're putting ext4 on? >> Probably the >> > > > best >> > > > results will come from creating a zfs *file system* (using the >> ZPL or >> > > > whatever it is) and running ceph-osd on top of that. >> > > > >> > > > There is at least one open bug from someone having problems >> there, but >> > > > we'd very much like to sort out the problem. >> > > > >> > > > sage >> > > >> > > -- >> > > To unsubscribe from this list: send the line "unsubscribe >> ceph-devel" in >> > > the body of a message to majordomo@vger.kernel.org >> > > More majordomo info at >> http://vger.kernel.org/majordomo-info.html >> >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [ceph-commit] Ceph Zfs 2012-10-26 19:38 ` Dan Mick 2012-10-27 5:14 ` Raghunandhan @ 2012-10-27 5:50 ` Raghunandhan 1 sibling, 0 replies; 8+ messages in thread From: Raghunandhan @ 2012-10-27 5:50 UTC (permalink / raw) To: Dan Mick; +Cc: Sage Weil, ceph-devel ceph status when used with zfs filesystem osd dies. # ceph -s health HEALTH_WARN 407 pgs degraded; 169 pgs down; 169 pgs peering; 15 pgs recovering; 323 pgs stuck unclean; recovery 38/42 degraded (90.476%); 19/21 unfound (90.476%); 1/2 in osds are down monmap e1: 2 mons at {a=11.0.0.2:6789/0,b=11.0.0.3:6789/0}, election epoch 4, quorum 0,1 a,b osdmap e7: 2 osds: 1 up, 2 in pgmap v10: 576 pgs: 15 active+recovering+degraded, 169 down+peering, 392 active+degraded; 8059 bytes data, 1004 MB used, 85683 MB / 86687 MB avail; 38/42 degraded (90.476%); 19/21 unfound (90.476%) mdsmap e5: 1/1/1 up {0=a=up:active}, 1 up:standby Log file generated when 2 osd's where up and later it went down. 2012-10-27 11:14:07.152741 mon.0 11.0.0.2:6789/0 27 : [INF] osdmap e5: 2 osds: 2 up, 2 in 2012-10-27 11:14:07.192719 mon.0 11.0.0.2:6789/0 28 : [INF] pgmap v6: 576 pgs: 576 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB avail 2012-10-27 11:14:12.007671 mon.0 11.0.0.2:6789/0 29 : [INF] pgmap v7: 576 pgs: 272 creating, 43 active, 253 active+clean, 8 active+recovering; 1243 bytes data, 1003 MB used, 85684 MB / 86687 MB avail; 9/18 degraded (50.000%) 2012-10-27 11:14:32.014302 mon.0 11.0.0.2:6789/0 30 : [DBG] osd.0 11.0.0.2:6801/24250 reported failed by osd.1 11.0.0.3:6801/8443 2012-10-27 11:14:37.033547 mon.0 11.0.0.2:6789/0 31 : [DBG] osd.0 11.0.0.2:6801/24250 reported failed by osd.1 11.0.0.3:6801/8443 2012-10-27 11:14:42.060678 mon.0 11.0.0.2:6789/0 32 : [DBG] osd.0 11.0.0.2:6801/24250 reported failed by osd.1 11.0.0.3:6801/8443 2012-10-27 11:14:42.060827 mon.0 11.0.0.2:6789/0 33 : [INF] osd.0 11.0.0.2:6801/24250 failed (3 reports from 1 peers after 30.046376 >= grace 20.000000) 2012-10-27 11:14:42.157536 mon.0 11.0.0.2:6789/0 34 : [INF] osdmap e6: 2 osds: 1 up, 2 in 2012-10-27 11:19:46.751562 mon.0 11.0.0.2:6789/0 40 : [INF] osd.0 out (down for 304.604259) 2012-10-27 11:19:46.785574 mon.0 11.0.0.2:6789/0 41 : [INF] osdmap e8: 2 osds: 1 up, 1 in 2012-10-27 11:19:46.811588 mon.0 11.0.0.2:6789/0 42 : [INF] pgmap v12: 576 pgs: 15 active+recovering+degraded, 169 down+peering, 392 active+degraded; 8059 bytes data, 1004 MB used, 85683 MB / 86687 MB avail; 38/42 degraded (90.476%); 19/21 unfound (90.476%) 2012-10-27 11:19:49.591172 mon.0 11.0.0.2:6789/0 43 : [INF] pgmap v13: 576 pgs: 15 active+recovering+degraded, 169 down+peering, 392 active+degraded; 8059 bytes data, 1004 MB used, 85683 MB / 86687 MB avail; 38/42 degraded (90.476%); 19/21 unfound (90.476%) 2012-10-27 11:20:04.671337 mon.0 11.0.0.2:6789/0 44 : [INF] pgmap v14: 576 pgs: 15 active+recovering+degraded, 169 down+peering, 392 active+degraded; 8059 bytes data, 1004 MB used, 85683 MB / 86687 MB avail; 38/42 degraded (90.476%); 19/21 unfound (90.476%) --- Regards, Raghunandhan.G On 27-10-2012 02:08, Dan Mick wrote: > On 10/25/2012 09:46 PM, Raghunandhan wrote: >> Hi Sage, >> >> Thanks for replying back, Once a zpool is created if i mount it on >> /var/lib/ceph/osd/ceph-0 the cephfs doesnt recognize it as a >> superblock >> and hence it fails, > > I assume you mean "once a zfs is created"? One can't mount zpools, > can one? > >> Im trying to build this on our cloud storage since >> btrfs has not been stable nor they have come up with online dedup i >> have >> no other choice for now to work with zfs ceph which makes sense. >> >> So what i exactly did was created a zpool store >> 1 Then used the same store and made a block device from it using zfs >> create >> 2 Once the zfs create was successful i was able to format with ext4 >> using xattr >> 3 On top of it was the ceph >> >> Following this process doesnt make sense because of multiple layer >> on >> the storage and the ceph consumes a lot of RAM and cpu cycles which >> ends >> up in kernel hung task. It would be great if there is a way i could >> directly use the zfs pool with ceph and make it work. > > Have you actually tried making a zfs filesystem in the zpool, and > using that as backing store for the osd? > >> >> --- >> Regards, >> Raghunandhan.G >> IIHT Cloud Solutions Pvt. Ltd. >> #15, 4th Floor, 'A' Wing, Sri Lakshmi Complex, >> St. Marks Road, Bangalore - 560 001, India >> >> On 25-10-2012 22:06, Sage Weil wrote: >>> [moved to ceph-devel] >>> >>> On Thu, 25 Oct 2012, Raghunandhan wrote: >>>> Hi All, >>>> >>>> I have been working around ceph quite a long and trying to stitch >>>> zfs >>>> with >>>> ceph. I was able to do it to certain extent as follows: >>>> 1. zpool creation >>>> 2. set dedup >>>> 3. create a mountable volume of zfs (zfs create) >>>> 4. format the volume with ext4 and enabling xattr >>>> 5. mkcephfs on the volume >>>> >>>> This actually works and dedup is perfect. But i need to avoid >>>> multiple layers >>>> on the storage since the performance is very slow and the kernel >>>> timeout >>>> occurs often for a 8GB RAM. I want to test the performance between >>>> btrfs and >>>> zfs. I want to avoid the above multiple layering on storage and >>>> make >>>> the ceph >>>> cluster aware of zfs. Let me know if anyone has workaround this. >>> >>> I'm not familiar enough with zfs to know what 'mountable volume' >>> means.. >>> is that a block device/lun that you're putting ext4 on? Probably >>> the >>> best >>> results will come from creating a zfs *file system* (using the ZPL >>> or >>> whatever it is) and running ceph-osd on top of that. >>> >>> There is at least one open bug from someone having problems there, >>> but >>> we'd very much like to sort out the problem. >>> >>> sage >> >> -- >> To unsubscribe from this list: send the line "unsubscribe >> ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [ceph-commit] Ceph Zfs 2012-10-25 15:36 ` [ceph-commit] Ceph Zfs Sage Weil 2012-10-26 4:46 ` Raghunandhan @ 2012-10-26 5:32 ` Raghunandhan 1 sibling, 0 replies; 8+ messages in thread From: Raghunandhan @ 2012-10-26 5:32 UTC (permalink / raw) To: Sage Weil; +Cc: ceph-devel Also the open bug which is pending i have tried with it. Ceph-osd starts up with zfs volume after the ceph service is up in sometime the osd's stop working. I have been working around with releases from ceph-0.30 till the latest 0.54 to check with zfs compatibility. Kindly let me know if this can happen in any way it would become a breakthrough in our storage design until btrfs becomes stable. --- Regards, Raghunandhan.G IIHT Cloud Solutions Pvt. Ltd. #15, 4th Floor, 'A' Wing, Sri Lakshmi Complex, St. Marks Road, Bangalore - 560 001, India On 25-10-2012 22:06, Sage Weil wrote: > [moved to ceph-devel] > > On Thu, 25 Oct 2012, Raghunandhan wrote: >> Hi All, >> >> I have been working around ceph quite a long and trying to stitch >> zfs with >> ceph. I was able to do it to certain extent as follows: >> 1. zpool creation >> 2. set dedup >> 3. create a mountable volume of zfs (zfs create) >> 4. format the volume with ext4 and enabling xattr >> 5. mkcephfs on the volume >> >> This actually works and dedup is perfect. But i need to avoid >> multiple layers >> on the storage since the performance is very slow and the kernel >> timeout >> occurs often for a 8GB RAM. I want to test the performance between >> btrfs and >> zfs. I want to avoid the above multiple layering on storage and make >> the ceph >> cluster aware of zfs. Let me know if anyone has workaround this. > > I'm not familiar enough with zfs to know what 'mountable volume' > means.. > is that a block device/lun that you're putting ext4 on? Probably the > best > results will come from creating a zfs *file system* (using the ZPL or > whatever it is) and running ceph-osd on top of that. > > There is at least one open bug from someone having problems there, > but > we'd very much like to sort out the problem. > > sage ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-10-28 5:19 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <441865dbf9e127f0b85193c512878a76@iihtcloudsolutions.com>
2012-10-25 15:36 ` [ceph-commit] Ceph Zfs Sage Weil
2012-10-26 4:46 ` Raghunandhan
2012-10-26 19:38 ` Dan Mick
2012-10-27 5:14 ` Raghunandhan
2012-10-27 17:15 ` Sage Weil
2012-10-28 5:19 ` Raghunandhan
2012-10-27 5:50 ` Raghunandhan
2012-10-26 5:32 ` Raghunandhan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.