From: "Adam Ochmański" <blink@blink.waw.pl>
To: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: problem with hanging cluster
Date: Thu, 08 Nov 2012 12:14:56 +0100 [thread overview]
Message-ID: <509B9430.40903@blink.waw.pl> (raw)
Hi,
our test cluster going stuck every time when one of our osd host going
down, when mising osd go to "up" state and recovery go to 100% cluster
still not working propertly.
When ceph crash there are some working jobs from other host which only
mounted by rbd and CephFS kernel driver. Each 5 clients do similar job
in loop
in. ex. dd if=/dev/zero of=/mnt/ceph/$filename bs=1M count=$random; dd
if=/mnt/ceph/$filename of=/dev/null bs=512k; rm -f /mnt/ceph/$filename
When cluster is fresh after new deploy, this test working propertly but
when wewill fail one of our osd some times, cluster not responding all
dd process going to state D.
We have cluster build with 3 nodes, with journal on ssd disk:
system debian 7.0 3.2.0-3-amd64
#ceph osd tree
# id weight type name up/down reweight
-1 6 root default
-3 6 rack unknownrack
-2 2 host uranos
0 1 osd.0 up 1
1 1 osd.1 up 1
-4 3 host node04
401 1 osd.401 up 1
402 1 osd.402 up 1
403 1 osd.403 up 1
-5 1 host node03
2 1 osd.2 up 1
#mount
/dev/sdb1 /var/lib/ceph/osd/ceph-401 ext4
rw,sync,noatime,user_xattr,barrier=0,data=writeback 0 0
/dev/sdc1 /var/lib/ceph/osd/ceph-402 ext4
rw,sync,noatime,user_xattr,barrier=0,data=writeback 0 0
/dev/sde1 /var/lib/ceph/osd/ceph-403 ext4
rw,sync,noatime,user_xattr,barrier=0,data=writeback 0 0
/dev/sdd1 /var/lib/ceph/journal ext4
rw,sync,noatime,user_xattr,barrier=0,data=writeback 0 0
/dev/sdd2 /var/lib/ceph/mon ext4
rw,sync,noatime,user_xattr,barrier=0,data=writeback 0 0
#ceph -s
health HEALTH_WARN 111 pgs peering; 111 pgs stuck inactive; 45 pgs
stuck unclean
monmap e1: 1 mons at {alfa=10.32.20.46:6789/0}, election epoch 1,
quorum 0 alfa
osdmap e853: 6 osds: 6 up, 6 in
pgmap v10870: 1152 pgs: 871 active+clean, 111 peering, 170
active+clean+scrubbing; 186 GB data, 478 GB used, 6500 GB / 6979 GB avail
mdsmap e9: 1/1/1 up {0=alfa=up:active}
### /var/log/ceph.log
http://pastebin.com/z6prrnS4
### client kernel driver CephFS
10.32.20.46:6789:/ on /mnt/ceph type ceph
(rw,relatime,name=admin,secret=<hidden>)
strace ls -al /mnt/ceph
http://pastebin.com/hqwa3sDt
### ceph.conf
[global]
auth supported = cephx
[osd]
osd journal size = 1000
filestore xattr use omap = true
osd journal = /var/lib/ceph/journal/osd.$id/journal
[mon.alfa]
host = node04
mon addr = 10.32.20.46:6789
[osd.401]
host = node04
[osd.402]
host = node04
[osd.403]
host = node04
[osd.2]
host = node03
[osd.0]
host = uranos
[osd.1]
host = uranos
[mds.alfa]
host = node04
Any suggest ? Thanks!
--
Best,
blink
next reply other threads:[~2012-11-08 11:22 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-08 11:14 Adam Ochmański [this message]
2012-11-08 11:42 ` problem with hanging cluster Adam Ochmański
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=509B9430.40903@blink.waw.pl \
--to=blink@blink.waw.pl \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.