From: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
To: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: ceph cluster hangs when rebooting one node
Date: Mon, 12 Nov 2012 16:04:58 +0100 [thread overview]
Message-ID: <50A1101A.4070401@profihost.ag> (raw)
Hello list,
i was checking what happens if i reboot a ceph node.
Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is
possible.
ceph -w:
Looks like this:
2012-11-12 16:03:58.191106 mon.0 [INF] pgmap v19013: 7032 pgs: 7032
active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail
2012-11-12 16:04:08.365557 mon.0 [INF] mon.a calling new monitor election
2012-11-12 16:04:13.422682 mon.0 [INF] mon.a@0 won leader election with
quorum 0,2
2012-11-12 16:04:13.708045 mon.0 [INF] pgmap v19014: 7032 pgs: 7032
active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail
2012-11-12 16:04:13.708059 mon.0 [INF] mdsmap e1: 0/0/1 up
2012-11-12 16:04:13.708070 mon.0 [INF] osdmap e4582: 20 osds: 20 up, 20 in
2012-11-12 16:04:08.242688 mon.2 [INF] mon.c calling new monitor election
2012-11-12 16:04:13.708089 mon.0 [INF] monmap e1: 3 mons at
{a=10.255.0.100:6789/0,b=10.255.0.101:6789/0,c=10.255.0.102:6789/0}
2012-11-12 16:04:14.070593 mon.0 [INF] pgmap v19015: 7032 pgs: 7032
active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail
2012-11-12 16:04:15.283954 mon.0 [INF] pgmap v19016: 7032 pgs: 7032
active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail
2012-11-12 16:04:18.506812 mon.0 [INF] osd.21 10.255.0.101:6800/5049
failed (3 reports from 3 peers after 20.339769 >= grace 20.000000)
2012-11-12 16:04:18.890003 mon.0 [INF] osdmap e4583: 20 osds: 19 up, 20 in
2012-11-12 16:04:19.137936 mon.0 [INF] pgmap v19017: 7032 pgs: 6720
active+clean, 312 stale+active+clean; 91615 MB data, 174 GB used, 4294
GB / 4469 GB avail
2012-11-12 16:04:20.024595 mon.0 [INF] osdmap e4584: 20 osds: 19 up, 20 in
2012-11-12 16:04:20.330149 mon.0 [INF] pgmap v19018: 7032 pgs: 6720
active+clean, 312 stale+active+clean; 91615 MB data, 174 GB used, 4294
GB / 4469 GB avail
2012-11-12 16:04:21.535471 mon.0 [INF] pgmap v19019: 7032 pgs: 6720
active+clean, 312 stale+active+clean; 91615 MB data, 174 GB used, 4294
GB / 4469 GB avail
2012-11-12 16:04:24.181292 mon.0 [INF] osd.22 10.255.0.101:6803/5153
failed (3 reports from 3 peers after 23.013550 >= grace 20.000000)
2012-11-12 16:04:24.182208 mon.0 [INF] osd.23 10.255.0.101:6806/5276
failed (3 reports from 3 peers after 21.000834 >= grace 20.000000)
2012-11-12 16:04:24.671373 mon.0 [INF] pgmap v19020: 7032 pgs: 6637
active+clean, 208 stale+active+clean, 187 incomplete; 91615 MB data, 174
GB used, 4295 GB / 4469 GB avail
2012-11-12 16:04:24.829022 mon.0 [INF] osdmap e4585: 20 osds: 17 up, 20 in
2012-11-12 16:04:24.870969 mon.0 [INF] osd.24 10.255.0.101:6809/5397
failed (3 reports from 3 peers after 20.688672 >= grace 20.000000)
2012-11-12 16:04:25.522333 mon.0 [INF] pgmap v19021: 7032 pgs: 5912
active+clean, 933 stale+active+clean, 187 incomplete; 91615 MB data, 174
GB used, 4295 GB / 4469 GB avail
2012-11-12 16:04:25.596927 mon.0 [INF] osd.24 10.255.0.101:6809/5397
failed (3 reports from 3 peers after 21.708444 >= grace 20.000000)
2012-11-12 16:04:26.077545 mon.0 [INF] osdmap e4586: 20 osds: 16 up, 20 in
2012-11-12 16:04:26.606475 mon.0 [INF] pgmap v19022: 7032 pgs: 5394
active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data,
173 GB used, 4296 GB / 4469 GB avail
2012-11-12 16:04:27.162034 mon.0 [INF] osdmap e4587: 20 osds: 16 up, 20 in
2012-11-12 16:04:27.656974 mon.0 [INF] pgmap v19023: 7032 pgs: 5394
active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data,
173 GB used, 4296 GB / 4469 GB avail
2012-11-12 16:04:30.229958 mon.0 [INF] pgmap v19024: 7032 pgs: 5394
active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data,
172 GB used, 4296 GB / 4469 GB avail
2012-11-12 16:04:31.411989 mon.0 [INF] pgmap v19025: 7032 pgs: 5394
active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data,
172 GB used, 4296 GB / 4469 GB avail
2012-11-12 16:04:32.617576 mon.0 [INF] pgmap v19026: 7032 pgs: 4660
active+clean, 2372 incomplete; 91615 MB data, 171 GB used, 4298 GB /
4469 GB avail
2012-11-12 16:04:35.172861 mon.0 [INF] pgmap v19027: 7032 pgs: 4660
active+clean, 2372 incomplete; 91615 MB data, 171 GB used, 4298 GB /
4469 GB avail
2012-11-12 16:04:30.505872 osd.53 [WRN] 6 slow requests, 6 included
below; oldest blocked for > 30.247691 secs
2012-11-12 16:04:30.505875 osd.53 [WRN] slow request 30.247691 seconds
old, received at 2012-11-12 16:04:00.258118:
osd_op(client.131626.0:771962 rb.0.107a.734602d5.000000000bce [write
2478080~4096] 3.562a9efc) v4 currently reached pg
2012-11-12 16:04:30.505879 osd.53 [WRN] slow request 30.238016 seconds
old, received at 2012-11-12 16:04:00.267793:
osd_op(client.131626.0:772116 rb.0.107a.734602d5.000000001608 [write
262144~4096] 3.a47890e) v4 currently reached pg
2012-11-12 16:04:30.505881 osd.53 [WRN] slow request 30.236572 seconds
old, received at 2012-11-12 16:04:00.269237:
osd_op(client.131626.0:772141 rb.0.107a.734602d5.000000001777 [write
798720~4096] 3.547bc855) v4 currently reached pg
2012-11-12 16:04:30.505883 osd.53 [WRN] slow request 30.227850 seconds
old, received at 2012-11-12 16:04:00.277959:
osd_op(client.131626.0:772283 rb.0.107a.734602d5.0000000000a6 [write
2379776~4096] 3.5d0f2510) v4 currently reached pg
2012-11-12 16:04:30.505884 osd.53 [WRN] slow request 30.227499 seconds
old, received at 2012-11-12 16:04:00.278310:
osd_op(client.131626.0:772289 rb.0.107a.734602d5.0000000000d0 [write
3379200~4096] 3.b031884f) v4 currently reached pg
2012-11-12 16:04:30.819063 osd.52 [WRN] 6 slow requests, 6 included
below; oldest blocked for > 30.578003 secs
2012-11-12 16:04:30.819069 osd.52 [WRN] slow request 30.578003 seconds
old, received at 2012-11-12 16:04:00.240978:
osd_op(client.131626.0:771697 rb.0.107a.734602d5.000000001916 [write
3076096~4096] 3.627cbcb1) v4 currently reached pg
2012-11-12 16:04:30.819076 osd.52 [WRN] slow request 30.546967 seconds
old, received at 2012-11-12 16:04:00.272014:
osd_op(client.131626.0:772187 rb.0.107a.734602d5.000000001974 [write
1675264~4096] 3.ba912483) v4 currently reached pg
2012-11-12 16:04:30.819078 osd.52 [WRN] slow request 30.544082 seconds
old, received at 2012-11-12 16:04:00.274899:
osd_op(client.131626.0:772235 rb.0.107a.734602d5.000000001bfd [write
3686400~4096] 3.29b75f52) v4 currently reached pg
2012-11-12 16:04:30.819080 osd.52 [WRN] slow request 30.496902 seconds
old, received at 2012-11-12 16:04:00.322079:
osd_op(client.131626.0:772944 rb.0.107a.734602d5.000000000bbb [write
266240~4096] 3.5db27880) v4 currently reached pg
2012-11-12 16:04:30.819081 osd.52 [WRN] slow request 30.470500 seconds
old, received at 2012-11-12 16:04:00.348481:
osd_op(client.131626.0:773397 rb.0.107a.734602d5.000000000bbb [write
4145152~4096] 3.5db27880) v4 currently reached pg
2012-11-12 16:04:31.202553 osd.51 [WRN] 6 slow requests, 6 included
below; oldest blocked for > 30.932114 secs
2012-11-12 16:04:31.203126 osd.51 [WRN] slow request 30.932114 seconds
old, received at 2012-11-12 16:04:00.270383:
osd_op(client.131626.0:772159 rb.0.107a.734602d5.000000001826 [write
3842048~4096] 3.d489eb11) v4 currently reached pg
2012-11-12 16:04:31.203130 osd.51 [WRN] slow request 30.902220 seconds
old, received at 2012-11-12 16:04:00.300277:
osd_op(client.131626.0:772552 rb.0.107a.734602d5.000000000fd9 [write
2990080~4096] 3.e64d168c) v4 currently reached pg
2012-11-12 16:04:31.203132 osd.51 [WRN] slow request 30.895459 seconds
old, received at 2012-11-12 16:04:00.307038:
osd_op(client.131626.0:772670 rb.0.107a.734602d5.00000000177f [write
1028096~4096] 3.dad40d42) v4 currently reached pg
2012-11-12 16:04:31.203135 osd.51 [WRN] slow request 30.891418 seconds
old, received at 2012-11-12 16:04:00.311079:
osd_op(client.131626.0:772730 rb.0.107a.734602d5.000000001ac6 [write
495616~4096] 3.27fd6b11) v4 currently reached pg
2012-11-12 16:04:31.203136 osd.51 [WRN] slow request 30.845134 seconds
old, received at 2012-11-12 16:04:00.357363:
osd_op(client.131626.0:773553 rb.0.107a.734602d5.000000001688 [write
3125248~4096] 3.c83fad42) v4 currently reached pg
Stefan
next reply other threads:[~2012-11-12 15:05 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-12 15:04 Stefan Priebe - Profihost AG [this message]
2012-11-12 15:11 ` ceph cluster hangs when rebooting one node Sage Weil
2012-11-12 15:24 ` Stefan Priebe - Profihost AG
2012-11-14 9:50 ` Aleksey Samarin
2012-11-14 15:06 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50A1101A.4070401@profihost.ag \
--to=s.priebe@profihost.ag \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.