* cosd locks up with 100% CPU during mkcephfs
[not found] <8969876631394566913@unknownmsgid>
@ 2010-10-09 0:50 ` Martin
0 siblings, 0 replies; 3+ messages in thread
From: Martin @ 2010-10-09 0:50 UTC (permalink / raw)
To: ceph-devel
Dear Mailinglist Members,
I have the problem that mkcephfs does not run through. It stops when cosd
locks up with 100%CPU on the first node - named CEPH1.
Out of the script:
fs created label (null) on /dev/sdb1
nodesize 4096 leafsize 4096 sectorsize 4096 size 19.99GB
Btrfs Btrfs v0.19
Scanning for Btrfs filesystems
monmap.4203 100% 477 0.5KB/s
00:00<x-apple-data-detectors://0>
--- ssh ceph1 "cd /home/ceph/ceph/ceph-0.21.3/src ; ulimit -c unlimited ;
/usr/local/bin/cosd -c /etc/ceph/ceph.conf --monmap /tmp/monmap.4203 -i 0
--mkfs --osd-data /data/osd0"
** WARNING: Ceph is still under heavy development, and is only suitable for
**
** testing and review. Do not trust it with important data.
**
-> then the script never returns
Top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4691 root 20 0 15164 2228 1864 <tel:0%2015164%202228%201864> R 94.8
0.9 25:54.65 cosd
root@CEPH1:/var/log/ceph# kill 4691
bash: line 1: 4691 Terminated /usr/local/bin/cosd -c
/etc/ceph/ceph.conf --monmap /tmp/monmap.4203 -i 0 --mkfs --osd-data
/data/osd0
failed: 'ssh ceph1 /usr/local/bin/cosd -c /etc/ceph/ceph.conf --monmap
/tmp/monmap.4203 -i 0 --mkfs --osd-data /data/osd0'
-> I have waited 1hour .. and no success.
A tail on osd.0.log
10.10.09_01:27:52.636060 b77856d0 journal header: block_size 4096 alignment
4096 max_size 0
10.10.09_01:27:52.636074 b77856d0 journal header: start 4096
10.10.09_01:27:52.636086 b77856d0 journal write_pos 0
10.10.09_01:27:52.646211 b77856d0 journal create done
10.10.09_01:27:52.646282 b77856d0 filestore(/data/osd0) mkjournal created
journal on /data/osd0/journal
10.10.09_01:27:52.646451 b77856d0 filestore(/data/osd0) mkfs done in
/data/osd0
10.10.09_01:27:52.646467 b77856d0 filestore(/data/osd0) basedir /data/osd0
journal /data/osd0/journal
10.10.09_01:27:52.646738 b77856d0 filestore(/data/osd0) mount detected btrfs
10.10.09_01:27:52.646766 b77856d0 filestore(/data/osd0) _do_clone_range 0~1
10.10.09_01:27:52.646784 b77856d0 filestore(/data/osd0) mount btrfs
CLONE_RANGE ioctl is supported
10.10.09_01:27:52.656929 b77856d0 filestore(/data/osd0) mount btrfs
SNAP_CREATE is supported
10.10.09_01:27:52.663403 b77856d0 filestore(/data/osd0) mount btrfs
SNAP_DESTROY is supported
10.10.09_01:27:52.663539 b77856d0 filestore(/data/osd0) mount fsid is
206080828 <tel:206080828>
10.10.09_01:27:52.663655 b77856d0 filestore(/data/osd0) mount found snaps <>
10.10.09_01:27:52.663938 b77856d0 filestore(/data/osd0) mount op_seq is 0
10.10.09_01:27:52.663956 b77856d0 filestore(/data/osd0) open_journal at
/data/osd0/journal
10.10.09_01:27:52.663985 b77856d0 journal journal_replay fs op_seq 0
10.10.09_01:27:52.664008 b77856d0 journal open /data/osd0/journal next_seq 1
10.10.09_01:27:52.664038 b77856d0 journal _open journal is not a block
device, NOT checking disk write cache on /data/osd0/journal
10.10.09_01:27:52.664052 b77856d0 journal _open /data/osd0/journal fd 8:
8192 bytes, block size 4096 bytes, directio = 1
10.10.09_01:27:52.664067 b77856d0 journal read_header
10.10.09_01:27:52.665300 b77856d0 journal header: block_size 4096 alignment
4096 max_size 0
10.10.09_01:27:52.665352 b77856d0 journal header: start 4096
10.10.09_01:27:52.665365 b77856d0 journal write_pos 4096
10.10.09_01:27:52.665389 b77856d0 journal open header.fsid =
206080828<tel:206080828>
____________________________________________________________________________
___
I just downloaded
<http://ceph.newdream.net/download/ceph-0.21.3.tar.gz><http://ceph.newdream.net/download/ceph-0.21.3.tar.gz>
http://ceph.newdream.net/download/ceph-0.21.3.tar.gz on a
Ubuntu 10.10 Server (x32).
root@CEPH1:/etc/ceph# uname -a
Linux CEPH1 2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:34:50 UTC 2010
i686 GNU/Linux
Done configure and make, followed by an install. Then I cloned the machine 3
more times, making 4 nodes. /dev/sdb1 is formatted with btrfs.
The nodes are running in VirtualBox
root@CEPH1:/etc/ceph# cat /etc/hosts
127.0.0.1 localhost
x.x.239.140 CEPH1
x.x.239.141 CEPH2
x.x.239.142 CEPH3
x.x.239.143 CEPH4
Distributed ssh-keys, so the scripts run through.
My ceph.conf looks like this:
root@CEPH1:/etc/ceph# cat ceph.conf
; global
[global]
; enable secure authentication
auth supported = cephx
; monitors
[mon]
mon data = /data/mon$id
debug ms = 1
debug mon = 20
debug paxos = 20
debug auth = 20
[mon0]
host = CEPH1
mon addr = x.x.239.140:6789
[mon1]
host = CEPH2
mon addr = x.x.239.141:6789
[mon2]
host = CEPH3
mon addr = x.x.239.142:6789
; mds
; You need at least one. Define two to get a standby.
[mds]
; where the mds keeps it's secret encryption keys
keyring = /data/keyring.$name
; mds logging to debug issues.
debug ms = 1
debug mds = 20
[mds.ceph1]
host = ceph1
[mds.ceph3]
host = ceph3
; osd
[osd]
osd data = /data/osd$id
osd journal = /data/osd$id/journal
debug ms = 1
debug osd = 20
debug filestore = 20
debug journal = 20
[osd0]
host = ceph1
btrfs devs = /dev/sdb1
[osd1]
host = ceph2
btrfs devs = /dev/sdb1
[osd2]
host = ceph3
btrfs devs = /dev/sdb1
[osd3]
host = ceph4
btrfs devs = /dev/sdb1
--
--
Martin
^ permalink raw reply [flat|nested] 3+ messages in thread
* cosd locks up with 100% CPU during mkcephfs
@ 2010-10-09 6:26 martin
2010-10-11 3:53 ` Sage Weil
0 siblings, 1 reply; 3+ messages in thread
From: martin @ 2010-10-09 6:26 UTC (permalink / raw)
To: ceph-devel
Dear Mailinglist Members,
I have the problem that mkcephfs does not run through. It stops when cosd
locks up with 100%CPU on the first node - named CEPH1.
Out of the script:
fs created label (null) on /dev/sdb1
nodesize 4096 leafsize 4096 sectorsize 4096 size 19.99GB Btrfs Btrfs
v0.19 Scanning for Btrfs filesystems
monmap.4203 100% 477 0.5KB/s 00:00
--- ssh ceph1 "cd /home/ceph/ceph/ceph-0.21.3/src ; ulimit -c unlimited ;
/usr/local/bin/cosd -c /etc/ceph/ceph.conf --monmap /tmp/monmap.4203 -i 0
--mkfs --osd-data /data/osd0"
** WARNING: Ceph is still under heavy development, and is only suitable for
**
** testing and review. Do not trust it with important data.
**
-> then the script never returns
Top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4691 root 20 0 15164 2228 1864 R 94.8 0.9 25:54.65 cosd
root@CEPH1:/var/log/ceph# kill 4691
bash: line 1: 4691 Terminated /usr/local/bin/cosd -c
/etc/ceph/ceph.conf --monmap /tmp/monmap.4203 -i 0 --mkfs --osd-data
/data/osd0
failed: 'ssh ceph1 /usr/local/bin/cosd -c /etc/ceph/ceph.conf --monmap
/tmp/monmap.4203 -i 0 --mkfs --osd-data /data/osd0'
-> I have waited 1hour .. and no success.
A tail on osd.0.log
10.10.09_01:27:52.636060 b77856d0 journal header: block_size 4096 alignment
4096 max_size 0
10.10.09_01:27:52.636074 b77856d0 journal header: start 4096
10.10.09_01:27:52.636086 b77856d0 journal write_pos 0
10.10.09_01:27:52.646211 b77856d0 journal create done
10.10.09_01:27:52.646282 b77856d0 filestore(/data/osd0) mkjournal created
journal on /data/osd0/journal
10.10.09_01:27:52.646451 b77856d0 filestore(/data/osd0) mkfs done in
/data/osd0
10.10.09_01:27:52.646467 b77856d0 filestore(/data/osd0) basedir /data/osd0
journal /data/osd0/journal
10.10.09_01:27:52.646738 b77856d0 filestore(/data/osd0) mount detected btrfs
10.10.09_01:27:52.646766 b77856d0 filestore(/data/osd0) _do_clone_range 0~1
10.10.09_01:27:52.646784 b77856d0 filestore(/data/osd0) mount btrfs
CLONE_RANGE ioctl is supported
10.10.09_01:27:52.656929 b77856d0 filestore(/data/osd0) mount btrfs
SNAP_CREATE is supported
10.10.09_01:27:52.663403 b77856d0 filestore(/data/osd0) mount btrfs
SNAP_DESTROY is supported
10.10.09_01:27:52.663539 b77856d0 filestore(/data/osd0) mount fsid is
206080828
10.10.09_01:27:52.663655 b77856d0 filestore(/data/osd0) mount found snaps <>
10.10.09_01:27:52.663938 b77856d0 filestore(/data/osd0) mount op_seq is 0
10.10.09_01:27:52.663956 b77856d0 filestore(/data/osd0) open_journal at
/data/osd0/journal
10.10.09_01:27:52.663985 b77856d0 journal journal_replay fs op_seq 0
10.10.09_01:27:52.664008 b77856d0 journal open /data/osd0/journal next_seq 1
10.10.09_01:27:52.664038 b77856d0 journal _open journal is not a block
device, NOT checking disk write cache on /data/osd0/journal
10.10.09_01:27:52.664052 b77856d0 journal _open /data/osd0/journal fd 8:
8192 bytes, block size 4096 bytes, directio = 1
10.10.09_01:27:52.664067 b77856d0 journal read_header
10.10.09_01:27:52.665300 b77856d0 journal header: block_size 4096 alignment
4096 max_size 0
10.10.09_01:27:52.665352 b77856d0 journal header: start 4096
10.10.09_01:27:52.665365 b77856d0 journal write_pos 4096
10.10.09_01:27:52.665389 b77856d0 journal open header.fsid = 206080828
____________________________________________________________________________
___
I just downloaded http://ceph.newdream.net/download/ceph-0.21.3.tar.gz on a
Ubuntu 10.10 Server (x32).
root@CEPH1:/etc/ceph# uname -a
Linux CEPH1 2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:34:50 UTC 2010
i686 GNU/Linux Done configure and make, followed by an install. Then I
cloned the machine 3 more times, making 4 nodes. /dev/sdb1 is formatted with
btrfs.
The nodes are running in VirtualBox
root@CEPH1:/etc/ceph# cat /etc/hosts
127.0.0.1 localhost
x.x.239.140 CEPH1
x.x.239.141 CEPH2
x.x.239.142 CEPH3
x.x.239.143 CEPH4
Distributed ssh-keys, so the scripts run through.
My ceph.conf looks like this:
root@CEPH1:/etc/ceph# cat ceph.conf
; global
[global]
; enable secure authentication
auth supported = cephx
; monitors
[mon]
mon data = /data/mon$id
debug ms = 1
debug mon = 20
debug paxos = 20
debug auth = 20
[mon0]
host = CEPH1
mon addr = x.x.239.140:6789
[mon1]
host = CEPH2
mon addr = x.x.239.141:6789
[mon2]
host = CEPH3
mon addr = x.x.239.142:6789
; mds
; You need at least one. Define two to get a standby.
[mds]
; where the mds keeps it's secret encryption keys
keyring = /data/keyring.$name
; mds logging to debug issues.
debug ms = 1
debug mds = 20
[mds.ceph1]
host = ceph1
[mds.ceph3]
host = ceph3
; osd
[osd]
osd data = /data/osd$id
osd journal = /data/osd$id/journal
debug ms = 1
debug osd = 20
debug filestore = 20
debug journal = 20
[osd0]
host = ceph1
btrfs devs = /dev/sdb1
[osd1]
host = ceph2
btrfs devs = /dev/sdb1
[osd2]
host = ceph3
btrfs devs = /dev/sdb1
[osd3]
host = ceph4
btrfs devs = /dev/sdb1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: cosd locks up with 100% CPU during mkcephfs
2010-10-09 6:26 cosd locks up with 100% CPU during mkcephfs martin
@ 2010-10-11 3:53 ` Sage Weil
0 siblings, 0 replies; 3+ messages in thread
From: Sage Weil @ 2010-10-11 3:53 UTC (permalink / raw)
To: martin; +Cc: ceph-devel
Hi Martin,
Can you attach to cosd with gdb and get a backtrace? Something like
# gdb /usr/bin/cosd `pgrep cosd`
[...]
(gdb) bt
Thanks!
sage
On Sat, 9 Oct 2010, martin wrote:
> Dear Mailinglist Members,
>
> I have the problem that mkcephfs does not run through. It stops when cosd
> locks up with 100%CPU on the first node - named CEPH1.
> Out of the script:
> fs created label (null) on /dev/sdb1
> nodesize 4096 leafsize 4096 sectorsize 4096 size 19.99GB Btrfs Btrfs
> v0.19 Scanning for Btrfs filesystems
> monmap.4203 100% 477 0.5KB/s 00:00
> --- ssh ceph1 "cd /home/ceph/ceph/ceph-0.21.3/src ; ulimit -c unlimited ;
> /usr/local/bin/cosd -c /etc/ceph/ceph.conf --monmap /tmp/monmap.4203 -i 0
> --mkfs --osd-data /data/osd0"
> ** WARNING: Ceph is still under heavy development, and is only suitable for
> **
> ** testing and review. Do not trust it with important data.
> **
> -> then the script never returns
> Top
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 4691 root 20 0 15164 2228 1864 R 94.8 0.9 25:54.65 cosd
> root@CEPH1:/var/log/ceph# kill 4691
> bash: line 1: 4691 Terminated /usr/local/bin/cosd -c
> /etc/ceph/ceph.conf --monmap /tmp/monmap.4203 -i 0 --mkfs --osd-data
> /data/osd0
> failed: 'ssh ceph1 /usr/local/bin/cosd -c /etc/ceph/ceph.conf --monmap
> /tmp/monmap.4203 -i 0 --mkfs --osd-data /data/osd0'
>
> -> I have waited 1hour .. and no success.
>
> A tail on osd.0.log
>
> 10.10.09_01:27:52.636060 b77856d0 journal header: block_size 4096 alignment
> 4096 max_size 0
> 10.10.09_01:27:52.636074 b77856d0 journal header: start 4096
> 10.10.09_01:27:52.636086 b77856d0 journal write_pos 0
> 10.10.09_01:27:52.646211 b77856d0 journal create done
> 10.10.09_01:27:52.646282 b77856d0 filestore(/data/osd0) mkjournal created
> journal on /data/osd0/journal
> 10.10.09_01:27:52.646451 b77856d0 filestore(/data/osd0) mkfs done in
> /data/osd0
> 10.10.09_01:27:52.646467 b77856d0 filestore(/data/osd0) basedir /data/osd0
> journal /data/osd0/journal
> 10.10.09_01:27:52.646738 b77856d0 filestore(/data/osd0) mount detected btrfs
> 10.10.09_01:27:52.646766 b77856d0 filestore(/data/osd0) _do_clone_range 0~1
> 10.10.09_01:27:52.646784 b77856d0 filestore(/data/osd0) mount btrfs
> CLONE_RANGE ioctl is supported
> 10.10.09_01:27:52.656929 b77856d0 filestore(/data/osd0) mount btrfs
> SNAP_CREATE is supported
> 10.10.09_01:27:52.663403 b77856d0 filestore(/data/osd0) mount btrfs
> SNAP_DESTROY is supported
> 10.10.09_01:27:52.663539 b77856d0 filestore(/data/osd0) mount fsid is
> 206080828
> 10.10.09_01:27:52.663655 b77856d0 filestore(/data/osd0) mount found snaps <>
> 10.10.09_01:27:52.663938 b77856d0 filestore(/data/osd0) mount op_seq is 0
> 10.10.09_01:27:52.663956 b77856d0 filestore(/data/osd0) open_journal at
> /data/osd0/journal
> 10.10.09_01:27:52.663985 b77856d0 journal journal_replay fs op_seq 0
> 10.10.09_01:27:52.664008 b77856d0 journal open /data/osd0/journal next_seq 1
> 10.10.09_01:27:52.664038 b77856d0 journal _open journal is not a block
> device, NOT checking disk write cache on /data/osd0/journal
> 10.10.09_01:27:52.664052 b77856d0 journal _open /data/osd0/journal fd 8:
> 8192 bytes, block size 4096 bytes, directio = 1
> 10.10.09_01:27:52.664067 b77856d0 journal read_header
> 10.10.09_01:27:52.665300 b77856d0 journal header: block_size 4096 alignment
> 4096 max_size 0
> 10.10.09_01:27:52.665352 b77856d0 journal header: start 4096
> 10.10.09_01:27:52.665365 b77856d0 journal write_pos 4096
> 10.10.09_01:27:52.665389 b77856d0 journal open header.fsid = 206080828
> ____________________________________________________________________________
> ___
>
>
> I just downloaded http://ceph.newdream.net/download/ceph-0.21.3.tar.gz on a
> Ubuntu 10.10 Server (x32).
> root@CEPH1:/etc/ceph# uname -a
> Linux CEPH1 2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:34:50 UTC 2010
> i686 GNU/Linux Done configure and make, followed by an install. Then I
> cloned the machine 3 more times, making 4 nodes. /dev/sdb1 is formatted with
> btrfs.
>
> The nodes are running in VirtualBox
> root@CEPH1:/etc/ceph# cat /etc/hosts
> 127.0.0.1 localhost
> x.x.239.140 CEPH1
> x.x.239.141 CEPH2
> x.x.239.142 CEPH3
> x.x.239.143 CEPH4
>
> Distributed ssh-keys, so the scripts run through.
>
> My ceph.conf looks like this:
> root@CEPH1:/etc/ceph# cat ceph.conf
> ; global
> [global]
> ; enable secure authentication
> auth supported = cephx
>
> ; monitors
> [mon]
> mon data = /data/mon$id
> debug ms = 1
> debug mon = 20
> debug paxos = 20
> debug auth = 20
>
> [mon0]
> host = CEPH1
> mon addr = x.x.239.140:6789
>
> [mon1]
> host = CEPH2
> mon addr = x.x.239.141:6789
>
> [mon2]
> host = CEPH3
> mon addr = x.x.239.142:6789
>
> ; mds
> ; You need at least one. Define two to get a standby.
> [mds]
> ; where the mds keeps it's secret encryption keys
> keyring = /data/keyring.$name
> ; mds logging to debug issues.
> debug ms = 1
> debug mds = 20
>
> [mds.ceph1]
> host = ceph1
> [mds.ceph3]
> host = ceph3
>
> ; osd
> [osd]
> osd data = /data/osd$id
> osd journal = /data/osd$id/journal
> debug ms = 1
> debug osd = 20
> debug filestore = 20
> debug journal = 20
>
> [osd0]
> host = ceph1
> btrfs devs = /dev/sdb1
> [osd1]
> host = ceph2
> btrfs devs = /dev/sdb1
> [osd2]
> host = ceph3
> btrfs devs = /dev/sdb1
> [osd3]
> host = ceph4
> btrfs devs = /dev/sdb1
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-10-11 3:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-09 6:26 cosd locks up with 100% CPU during mkcephfs martin
2010-10-11 3:53 ` Sage Weil
[not found] <8969876631394566913@unknownmsgid>
2010-10-09 0:50 ` Martin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.