* problems creating new ceph cluster when using journal on block device
@ 2012-11-08 7:29 Travis Rhoden
2012-11-08 8:08 ` Wido den Hollander
0 siblings, 1 reply; 7+ messages in thread
From: Travis Rhoden @ 2012-11-08 7:29 UTC (permalink / raw)
To: ceph-devel
Hey folks,
I'm trying to set up a brand new Ceph cluster, based on v0.53. My
hardware has SSDs for journals, and I'm trying to get mkcephfs to
intialize everything for me. However, the command hangs forever and I
eventually have to kill it.
After poking around a bit, it's clear that the problem has something
to do with the journal. If I comment out the journal in ceph.conf,
the commands proceed just find. This is the first time I've tried to
throw a journal on a block device rather than a file, so maybe I've
done something wrong with that.
Here is the info from ceph.conf:
[osd]
osd journal size = 4000
[osd.0]
host = ceph1
osd journal = /dev/sda5
when I log in the log file, here is what I see:
2012-11-07 23:18:20.578623 7fe2743e3780 1
filestore(/var/lib/ceph/osd/ceph-0) mkfs in /var/lib/ceph/osd/ceph-0
2012-11-07 23:18:20.578699 7fe2743e3780 1
filestore(/var/lib/ceph/osd/ceph-0) mkfs fsid is already set to
4aac6842-8d71-4405-88ad-e3e9e4da308d
2012-11-07 23:18:20.632138 7fe2743e3780 1
filestore(/var/lib/ceph/osd/ceph-0) leveldb db exists/created
2012-11-07 23:18:20.634338 7fe2743e3780 0 journal kernel version is 3.2.0
2012-11-07 23:18:20.634579 7fe2743e3780 1 journal _open /dev/sda5 fd
9: 4194304000 bytes, block size 4096 bytes, directio = 1, aio = 0
2012-11-07 23:18:20.634995 7fe2743e3780 1 journal check: header looks ok
2012-11-07 23:18:20.636020 7fe2743e3780 1
filestore(/var/lib/ceph/osd/ceph-0) mkfs done in
/var/lib/ceph/osd/ceph-0
2012-11-07 23:18:20.682113 7fe2743e3780 0
filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is supported
and appears to work
2012-11-07 23:18:20.682125 7fe2743e3780 0
filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is disabled via
'filestore fiemap' config option
2012-11-07 23:18:20.682424 7fe2743e3780 0
filestore(/var/lib/ceph/osd/ceph-0) mount did NOT detect btrfs
2012-11-07 23:18:20.781938 7fe2743e3780 0
filestore(/var/lib/ceph/osd/ceph-0) mount syncfs(2) syscall fully
supported (by glibc and kernel)
2012-11-07 23:18:20.782061 7fe2743e3780 0
filestore(/var/lib/ceph/osd/ceph-0) mount found snaps <>
2012-11-07 23:18:20.823915 7fe2743e3780 0
filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal
mode: btrfs not detected
2012-11-07 23:18:20.826137 7fe2743e3780 0 journal kernel version is 3.2.0
2012-11-07 23:18:20.826386 7fe2743e3780 1 journal _open /dev/sda5 fd
15: 4194304000 bytes, block size 4096 bytes, directio = 1, aio = 0
So I know it is trying to use the right partition/block device. It
just never get's past that line.
Finally, I tried to track things down myself to see what was hanging
using strace. I ran:
strace /usr/bin/ceph-osd -c /tmp/travis/conf --monmap
/tmp/travis/monmap -i 0 --mkfs --mkkey
And the final output from that is:
open("/dev/sda5", O_RDONLY) = 15
fstat(15, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 5), ...}) = 0
ioctl(15, BLKGETSIZE64, 0x7fffe7a587a8) = 0
geteuid() = 0
pipe2([16, 17], O_CLOEXEC) = 0
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f5365f28a50) = 707
close(17) = 0
fcntl(16, F_SETFD, 0) = 0
fstat(16, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f5365f14000
read(16, "\n/dev/sda5:\n write-caching = 1 "..., 4096) = 37
open("/proc/version", O_RDONLY) = 17
read(17, "Linux version 3.2.0-23-generic ("..., 127) = 127
futex(0x2db807c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2db8078,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x2db8028, FUTEX_WAKE_PRIVATE, 1) = 1
close(17) = 0
close(16) = 0
wait4(707, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 707
munmap(0x7f5365f14000, 4096) = 0
io_setup(128, {139996169318400}) = 0
futex(0x2db807c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2db8078,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x2db8028, FUTEX_WAKE_PRIVATE, 1) = 1
pread(15, "\2\0\0\0000\0\0\0\1\0\0\0\0\0\0\0J\254hB\215qD\5\210\255\343\351\344\3320\215"...,
4096, 0) = 4096
And that's as far as it gets. Any thoughts?
After some sleep, I'll try throwing the journal back on a file instead
of a block device and see if that does it.
Can anyone confirm that using a block device instead of a file is
actually better performance?
Thanks,
- Travis
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: problems creating new ceph cluster when using journal on block device
2012-11-08 7:29 problems creating new ceph cluster when using journal on block device Travis Rhoden
@ 2012-11-08 8:08 ` Wido den Hollander
2012-11-08 8:24 ` Mark Kirkwood
0 siblings, 1 reply; 7+ messages in thread
From: Wido den Hollander @ 2012-11-08 8:08 UTC (permalink / raw)
To: Travis Rhoden; +Cc: ceph-devel
On 08-11-12 08:29, Travis Rhoden wrote:
> Hey folks,
>
> I'm trying to set up a brand new Ceph cluster, based on v0.53. My
> hardware has SSDs for journals, and I'm trying to get mkcephfs to
> intialize everything for me. However, the command hangs forever and I
> eventually have to kill it.
>
> After poking around a bit, it's clear that the problem has something
> to do with the journal. If I comment out the journal in ceph.conf,
> the commands proceed just find. This is the first time I've tried to
> throw a journal on a block device rather than a file, so maybe I've
> done something wrong with that.
>
> Here is the info from ceph.conf:
>
>
> [osd]
> osd journal size = 4000
Not sure if this is the problem, but when using a block device you don't
have to specify the size for the journal.
Wido
> [osd.0]
> host = ceph1
> osd journal = /dev/sda5
>
>
> when I log in the log file, here is what I see:
>
> 2012-11-07 23:18:20.578623 7fe2743e3780 1
> filestore(/var/lib/ceph/osd/ceph-0) mkfs in /var/lib/ceph/osd/ceph-0
> 2012-11-07 23:18:20.578699 7fe2743e3780 1
> filestore(/var/lib/ceph/osd/ceph-0) mkfs fsid is already set to
> 4aac6842-8d71-4405-88ad-e3e9e4da308d
> 2012-11-07 23:18:20.632138 7fe2743e3780 1
> filestore(/var/lib/ceph/osd/ceph-0) leveldb db exists/created
> 2012-11-07 23:18:20.634338 7fe2743e3780 0 journal kernel version is 3.2.0
> 2012-11-07 23:18:20.634579 7fe2743e3780 1 journal _open /dev/sda5 fd
> 9: 4194304000 bytes, block size 4096 bytes, directio = 1, aio = 0
> 2012-11-07 23:18:20.634995 7fe2743e3780 1 journal check: header looks ok
> 2012-11-07 23:18:20.636020 7fe2743e3780 1
> filestore(/var/lib/ceph/osd/ceph-0) mkfs done in
> /var/lib/ceph/osd/ceph-0
> 2012-11-07 23:18:20.682113 7fe2743e3780 0
> filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is supported
> and appears to work
> 2012-11-07 23:18:20.682125 7fe2743e3780 0
> filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is disabled via
> 'filestore fiemap' config option
> 2012-11-07 23:18:20.682424 7fe2743e3780 0
> filestore(/var/lib/ceph/osd/ceph-0) mount did NOT detect btrfs
> 2012-11-07 23:18:20.781938 7fe2743e3780 0
> filestore(/var/lib/ceph/osd/ceph-0) mount syncfs(2) syscall fully
> supported (by glibc and kernel)
> 2012-11-07 23:18:20.782061 7fe2743e3780 0
> filestore(/var/lib/ceph/osd/ceph-0) mount found snaps <>
> 2012-11-07 23:18:20.823915 7fe2743e3780 0
> filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal
> mode: btrfs not detected
> 2012-11-07 23:18:20.826137 7fe2743e3780 0 journal kernel version is 3.2.0
> 2012-11-07 23:18:20.826386 7fe2743e3780 1 journal _open /dev/sda5 fd
> 15: 4194304000 bytes, block size 4096 bytes, directio = 1, aio = 0
>
> So I know it is trying to use the right partition/block device. It
> just never get's past that line.
>
> Finally, I tried to track things down myself to see what was hanging
> using strace. I ran:
>
> strace /usr/bin/ceph-osd -c /tmp/travis/conf --monmap
> /tmp/travis/monmap -i 0 --mkfs --mkkey
>
> And the final output from that is:
>
> open("/dev/sda5", O_RDONLY) = 15
> fstat(15, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 5), ...}) = 0
> ioctl(15, BLKGETSIZE64, 0x7fffe7a587a8) = 0
> geteuid() = 0
> pipe2([16, 17], O_CLOEXEC) = 0
> clone(child_stack=0,
> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x7f5365f28a50) = 707
> close(17) = 0
> fcntl(16, F_SETFD, 0) = 0
> fstat(16, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x7f5365f14000
> read(16, "\n/dev/sda5:\n write-caching = 1 "..., 4096) = 37
> open("/proc/version", O_RDONLY) = 17
> read(17, "Linux version 3.2.0-23-generic ("..., 127) = 127
> futex(0x2db807c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2db8078,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x2db8028, FUTEX_WAKE_PRIVATE, 1) = 1
> close(17) = 0
> close(16) = 0
> wait4(707, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 707
> munmap(0x7f5365f14000, 4096) = 0
> io_setup(128, {139996169318400}) = 0
> futex(0x2db807c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2db8078,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x2db8028, FUTEX_WAKE_PRIVATE, 1) = 1
> pread(15, "\2\0\0\0000\0\0\0\1\0\0\0\0\0\0\0J\254hB\215qD\5\210\255\343\351\344\3320\215"...,
> 4096, 0) = 4096
>
> And that's as far as it gets. Any thoughts?
>
> After some sleep, I'll try throwing the journal back on a file instead
> of a block device and see if that does it.
>
> Can anyone confirm that using a block device instead of a file is
> actually better performance?
>
> Thanks,
>
> - Travis
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: problems creating new ceph cluster when using journal on block device
2012-11-08 8:08 ` Wido den Hollander
@ 2012-11-08 8:24 ` Mark Kirkwood
2012-11-08 15:01 ` Travis Rhoden
0 siblings, 1 reply; 7+ messages in thread
From: Mark Kirkwood @ 2012-11-08 8:24 UTC (permalink / raw)
To: Wido den Hollander; +Cc: Travis Rhoden, ceph-devel
On 08/11/12 21:08, Wido den Hollander wrote:
>
>
> On 08-11-12 08:29, Travis Rhoden wrote:
>> Hey folks,
>>
>> I'm trying to set up a brand new Ceph cluster, based on v0.53. My
>> hardware has SSDs for journals, and I'm trying to get mkcephfs to
>> intialize everything for me. However, the command hangs forever and I
>> eventually have to kill it.
>>
>> After poking around a bit, it's clear that the problem has something
>> to do with the journal. If I comment out the journal in ceph.conf,
>> the commands proceed just find. This is the first time I've tried to
>> throw a journal on a block device rather than a file, so maybe I've
>> done something wrong with that.
>>
>> Here is the info from ceph.conf:
>>
>>
>> [osd]
>> osd journal size = 4000
>
> Not sure if this is the problem, but when using a block device you don't
> have to specify the size for the journal.
Also might be useful to know make/model of ssd, plus motherboard
make/model (in case commenting out size does not fix)!
Regards
Mark
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: problems creating new ceph cluster when using journal on block device
2012-11-08 8:24 ` Mark Kirkwood
@ 2012-11-08 15:01 ` Travis Rhoden
2012-11-08 15:08 ` Travis Rhoden
0 siblings, 1 reply; 7+ messages in thread
From: Travis Rhoden @ 2012-11-08 15:01 UTC (permalink / raw)
To: Mark Kirkwood; +Cc: Wido den Hollander, ceph-devel
>>> [osd]
>>> osd journal size = 4000
>>
>>
>> Not sure if this is the problem, but when using a block device you don't
>> have to specify the size for the journal.
So happy to know that, Wido! I had hoped there was a way to skip that.
Tried without it -- only difference in the logs was seeing that it
picked up the full size of the partition. So, same result.
> Also might be useful to know make/model of ssd, plus motherboard make/model
> (in case commenting out size does not fix)!
It's an Intel X25-E, 64GB. It's a place-holder until some bigger ones
we have on order show up.
The mother board is a SuperMicro X8DT6. SSDs are connected to onboard
SATA ports, data drives are connected to LSI 9211-8i (SAS2008)
Maybe there is a special way I need to do the partition? My goal was
to throw 6 journals on this disk, and it is partitioned like so:
Model: ATA SSDSA2SH064G1GC (scsi)
Disk /dev/sda: 64.0GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 512MB 511MB primary raid
2 512MB 2511MB 2000MB primary raid
3 2511MB 6512MB 4000MB primary raid
4 6512MB 64.0GB 57.5GB extended
5 6513MB 15.1GB 8590MB logical
6 15.1GB 23.7GB 8590MB logical
7 23.7GB 32.3GB 8590MB logical
8 32.3GB 40.9GB 8590MB logical
9 40.9GB 49.5GB 8590MB logical
10 49.5GB 58.1GB 8590MB logical
So, sda5-10 are my journal partitions. I know that I have consumed
most of the drive here, and that is bad for the SSD and such, but it
really is a temporary setup.
- Travis
On Thu, Nov 8, 2012 at 3:24 AM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
> On 08/11/12 21:08, Wido den Hollander wrote:
>>
>>
>>
>> On 08-11-12 08:29, Travis Rhoden wrote:
>>>
>>> Hey folks,
>>>
>>> I'm trying to set up a brand new Ceph cluster, based on v0.53. My
>>> hardware has SSDs for journals, and I'm trying to get mkcephfs to
>>> intialize everything for me. However, the command hangs forever and I
>>> eventually have to kill it.
>>>
>>> After poking around a bit, it's clear that the problem has something
>>> to do with the journal. If I comment out the journal in ceph.conf,
>>> the commands proceed just find. This is the first time I've tried to
>>> throw a journal on a block device rather than a file, so maybe I've
>>> done something wrong with that.
>>>
>>> Here is the info from ceph.conf:
>>>
>>>
>>> [osd]
>>> osd journal size = 4000
>>
>>
>> Not sure if this is the problem, but when using a block device you don't
>> have to specify the size for the journal.
>
>
> Also might be useful to know make/model of ssd, plus motherboard make/model
> (in case commenting out size does not fix)!
>
> Regards
>
> Mark
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: problems creating new ceph cluster when using journal on block device
2012-11-08 15:01 ` Travis Rhoden
@ 2012-11-08 15:08 ` Travis Rhoden
2012-11-08 17:36 ` Travis Rhoden
0 siblings, 1 reply; 7+ messages in thread
From: Travis Rhoden @ 2012-11-08 15:08 UTC (permalink / raw)
To: ceph-devel
One more thing -- Google search says this is harmless -- I see quite a
few of these in syslog:
hdparm: sending ioctl 2285 to a partition!
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: problems creating new ceph cluster when using journal on block device
2012-11-08 15:08 ` Travis Rhoden
@ 2012-11-08 17:36 ` Travis Rhoden
2012-11-08 17:41 ` Mark Nelson
0 siblings, 1 reply; 7+ messages in thread
From: Travis Rhoden @ 2012-11-08 17:36 UTC (permalink / raw)
To: ceph-devel
Solved!
I stumbled into the solution while switching from block device to a
file. I was being bit by running mkcephfs multiple times -- it wasn't
really failing on the journal, it was failing because the OSD data
disk had been initialized before. I couldn't see that until I used a
file for the journal and then I see log output like:
=== osd.0 ===
2012-11-08 16:41:37.677620 7ffc3cfcd780 -1 provided osd id 0 != superblock's -1
2012-11-08 16:41:37.678726 7ffc3cfcd780 -1 ** ERROR: error creating
empty object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument
I unmounted the OSD's that had been touched before, reformatted them,
and then remounted. I setup ceph.conf to use block devices for the
journals, and then everything proceeded normally.
So the final relevant bits from my ceph.conf file look like:
[osd]
osd journal size = 0
journal dio = true
journal aio = true
[osd.0]
host = ceph1
osd journal = /dev/sda5
[osd.1]
host = ceph1
osd journal = /dev/sda6
...
Thanks,
- Travis
On Thu, Nov 8, 2012 at 10:08 AM, Travis Rhoden <trhoden@gmail.com> wrote:
> One more thing -- Google search says this is harmless -- I see quite a
> few of these in syslog:
>
> hdparm: sending ioctl 2285 to a partition!
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: problems creating new ceph cluster when using journal on block device
2012-11-08 17:36 ` Travis Rhoden
@ 2012-11-08 17:41 ` Mark Nelson
0 siblings, 0 replies; 7+ messages in thread
From: Mark Nelson @ 2012-11-08 17:41 UTC (permalink / raw)
To: Travis Rhoden; +Cc: ceph-devel
On 11/08/2012 11:36 AM, Travis Rhoden wrote:
> Solved!
>
> I stumbled into the solution while switching from block device to a
> file. I was being bit by running mkcephfs multiple times -- it wasn't
> really failing on the journal, it was failing because the OSD data
> disk had been initialized before. I couldn't see that until I used a
> file for the journal and then I see log output like:
Yeah, that was a change that landed a couple of months ago. It's really
important now to blow away the old data (I just reformat) if you want a
totally clean ceph deployment rather than just running mkcephfs.
>
> === osd.0 ===
> 2012-11-08 16:41:37.677620 7ffc3cfcd780 -1 provided osd id 0 != superblock's -1
> 2012-11-08 16:41:37.678726 7ffc3cfcd780 -1 ** ERROR: error creating
> empty object store in /var/lib/ceph/osd/ceph-0: (22) Invalid argument
>
> I unmounted the OSD's that had been touched before, reformatted them,
> and then remounted. I setup ceph.conf to use block devices for the
> journals, and then everything proceeded normally.
>
> So the final relevant bits from my ceph.conf file look like:
>
> [osd]
> osd journal size = 0
> journal dio = true
> journal aio = true
>
> [osd.0]
> host = ceph1
> osd journal = /dev/sda5
>
> [osd.1]
> host = ceph1
> osd journal = /dev/sda6
> ...
>
> Thanks,
>
> - Travis
>
> On Thu, Nov 8, 2012 at 10:08 AM, Travis Rhoden <trhoden@gmail.com> wrote:
>> One more thing -- Google search says this is harmless -- I see quite a
>> few of these in syslog:
>>
>> hdparm: sending ioctl 2285 to a partition!
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-11-08 17:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-08 7:29 problems creating new ceph cluster when using journal on block device Travis Rhoden
2012-11-08 8:08 ` Wido den Hollander
2012-11-08 8:24 ` Mark Kirkwood
2012-11-08 15:01 ` Travis Rhoden
2012-11-08 15:08 ` Travis Rhoden
2012-11-08 17:36 ` Travis Rhoden
2012-11-08 17:41 ` Mark Nelson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.