From: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
To: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Upgrading from 0.61.5 to 0.61.6 ended in disaster
Date: Wed, 24 Jul 2013 09:05:44 +0200 [thread overview]
Message-ID: <51EF7CC8.9070507@profihost.ag> (raw)
Hi,
today i wanted to upgrade from 0.61.5 to 0.61.6 to get rid of the mon bug.
But this ended in a complete desaster.
What i've done:
1.) recompiled ceph tagged with 0.61.6
2.) installed new ceph version on all machines
3.) JUST tried to restart ONE mon
this failed with:
[1774]: (33) Numerical argument out of domain
failed: 'ulimit -n 8192; /usr/bin/ceph-mon -i a --pid-file
/var/run/ceph/mon.a.pid -c /etc/ceph/ceph.conf '
2013-07-24 08:41:43.086951 7f53c185d700 -1 mon.a@0(leader) e1 *** Got
Signal Terminated ***
2013-07-24 08:41:43.088090 7f53c185d700 0 quorum service shutdown
2013-07-24 08:41:43.088094 7f53c185d700 0 mon.a@0(???).health(3840)
HealthMonitor::service_shutdown 1 services
2013-07-24 08:41:43.088097 7f53c185d700 0 quorum service shutdown
2013-07-24 08:41:44.224104 7fae6384a780 0 ceph version
0.61.6-15-g85db066 (85db0667307ac803c753d16fa374dd2fc29d76f3), process
ceph-mon, pid 29871
2013-07-24 08:41:56.097385 7fae6384a780 -1 mon/OSDMonitor.cc: In
function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread
7fae6384a780 time 2013-07-24 08:41:56.096683
mon/OSDMonitor.cc: 156: FAILED assert(latest_full > 0)
ceph version 0.61.6-15-g85db066 (85db0667307ac803c753d16fa374dd2fc29d76f3)
1: (OSDMonitor::update_from_paxos(bool*)+0x2413) [0x50f5a3]
2: (PaxosService::refresh(bool*)+0xe6) [0x4f2c66]
3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x48f7b7]
4: (Monitor::init_paxos()+0xe5) [0x48f955]
5: (Monitor::preinit()+0x679) [0x4bba79]
6: (main()+0x36b0) [0x484bb0]
7: (__libc_start_main()+0xfd) [0x7fae619a6c8d]
8: /usr/bin/ceph-mon() [0x4801e9]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- begin dump of recent events ---
-13> 2013-07-24 08:41:44.222821 7fae6384a780 5 asok(0x2698000)
register_command perfcounters_dump hook 0x2682010
-12> 2013-07-24 08:41:44.222835 7fae6384a780 5 asok(0x2698000)
register_command 1 hook 0x2682010
-11> 2013-07-24 08:41:44.222837 7fae6384a780 5 asok(0x2698000)
register_command perf dump hook 0x2682010
-10> 2013-07-24 08:41:44.222842 7fae6384a780 5 asok(0x2698000)
register_command perfcounters_schema hook 0x2682010
-9> 2013-07-24 08:41:44.222845 7fae6384a780 5 asok(0x2698000)
register_command 2 hook 0x2682010
-8> 2013-07-24 08:41:44.222847 7fae6384a780 5 asok(0x2698000)
register_command perf schema hook 0x2682010
-7> 2013-07-24 08:41:44.222849 7fae6384a780 5 asok(0x2698000)
register_command config show hook 0x2682010
-6> 2013-07-24 08:41:44.222852 7fae6384a780 5 asok(0x2698000)
register_command config set hook 0x2682010
-5> 2013-07-24 08:41:44.222854 7fae6384a780 5 asok(0x2698000)
register_command log flush hook 0x2682010
-4> 2013-07-24 08:41:44.222856 7fae6384a780 5 asok(0x2698000)
register_command log dump hook 0x2682010
-3> 2013-07-24 08:41:44.222859 7fae6384a780 5 asok(0x2698000)
register_command log reopen hook 0x2682010
-2> 2013-07-24 08:41:44.224104 7fae6384a780 0 ceph version
0.61.6-15-g85db066 (85db0667307ac803c753d16fa374dd2fc29d76f3), process
ceph-mon, pid 29871
-1> 2013-07-24 08:41:44.224397 7fae6384a780 1 finished
global_init_daemonize
0> 2013-07-24 08:41:56.097385 7fae6384a780 -1 mon/OSDMonitor.cc: In
function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread
7fae6384a780 time 2013-07-24 08:41:56.096683
mon/OSDMonitor.cc: 156: FAILED assert(latest_full > 0)
ceph version 0.61.6-15-g85db066 (85db0667307ac803c753d16fa374dd2fc29d76f3)
1: (OSDMonitor::update_from_paxos(bool*)+0x2413) [0x50f5a3]
2: (PaxosService::refresh(bool*)+0xe6) [0x4f2c66]
3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x48f7b7]
4: (Monitor::init_paxos()+0xe5) [0x48f955]
5: (Monitor::preinit()+0x679) [0x4bba79]
6: (main()+0x36b0) [0x484bb0]
7: (__libc_start_main()+0xfd) [0x7fae619a6c8d]
8: /usr/bin/ceph-mon() [0x4801e9]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
4.) i thought no problem mon.b and mon.c are still running. BUT all OSDs
were still trying to reach mon.a
2013-07-24 08:41:43.088997 7f011268f700 0 monclient: hunting for new mon
2013-07-24 08:41:56.792449 7f0109e7e700 0 -- 10.255.0.82:6802/29397 >>
10.255.0.100:6789/0 pipe(0x489e000 sd=286 :0 s=1 pgs=0 cs=0 l=1).fault
2013-07-24 08:42:02.792990 7f0116b6c700 0 -- 10.255.0.82:6802/29397 >>
10.255.0.100:6789/0 pipe(0x3c02780 sd=256 :0 s=1 pgs=0 cs=0 l=1).fault
2013-07-24 08:42:11.793525 7f0109d7d700 0 -- 10.255.0.82:6802/29397 >>
10.255.0.100:6789/0 pipe(0x84ec280 sd=256 :0 s=1 pgs=0 cs=0 l=1).fault
2013-07-24 08:42:23.794315 7f0109e7e700 0 -- 10.255.0.82:6802/29397 >>
10.255.0.100:6789/0 pipe(0x44c7b80 sd=286 :0 s=1 pgs=0 cs=0 l=1).fault
2013-07-24 08:42:27.621336 7f0122d2e700 0 log [WRN] : 5 slow requests,
5 included below; oldest blocked for > 30.378391 secs
2013-07-24 08:42:27.621344 7f0122d2e700 0 log [WRN] : slow request
30.378391 seconds old, received at 2013-07-24 08:41:57.242902:
osd_op(client.14727601.0:3839848
rbd_data.e0b5b26b8b4567.0000000000005b5a [write 684032~4096] 5.816d89d1
snapc bef=[bef] e142137) v4 currently wait for new map
2013-07-24 08:42:27.621348 7f0122d2e700 0 log [WRN] : slow request
30.195074 seconds old, received at 2013-07-24 08:41:57.426219:
osd_op(client.14828945.0:1088870
rbd_data.e245696b8b4567.000000000000140e [write 988160~7168] 5.ed959c36
snapc b80=[b80] e142137) v4 currently wait for new map
2013-07-24 08:42:27.621350 7f0122d2e700 0 log [WRN] : slow request
30.148871 seconds old, received at 2013-07-24 08:41:57.472422:
osd_op(client.14667314.0:2818172
rbd_data.dfcaa86b8b4567.0000000000000a13 [write 1654784~4096] 5.6972a67e
snapc baa=[baa] e142137) v4 currently wait for new map
2013-07-24 08:42:27.621351 7f0122d2e700 0 log [WRN] : slow request
30.148829 seconds old, received at 2013-07-24 08:41:57.472464:
osd_op(client.14667314.0:2818173
rbd_data.dfcaa86b8b4567.0000000000000a13 [write 1957888~4096] 5.6972a67e
snapc baa=[baa] e142137) v4 currently wait for new map
2013-07-24 08:42:27.621352 7f0122d2e700 0 log [WRN] : slow request
30.148784 seconds old, received at 2013-07-24 08:41:57.472509:
osd_op(client.14667314.0:2818174
rbd_data.dfcaa86b8b4567.0000000000000a13 [write 1966080~4096] 5.6972a67e
snapc baa=[baa] e142137) v4 currently wait for new map
...
2013-07-24 08:50:20.826687 7f00ee6d9700 0 -- 10.255.0.82:6802/29397 >>
10.255.0.100:6789/0 pipe(0xdf02280 sd=288 :0 s=1 pgs=0 cs=0 l=1).fault
2013-07-24 08:50:26.826914 7f00f1697700 0 -- 10.255.0.82:6802/29397 >>
10.255.0.100:6789/0 pipe(0x465a000 sd=229 :0 s=1 pgs=0 cs=0 l=1).fault
2013-07-24 08:50:40.713100 7f00ee6d9700 0 -- 10.255.0.82:6802/29397 >>
10.255.0.100:6789/0 pipe(0x4383680 sd=281 :0 s=1 pgs=0 cs=0 l=1).fault
2013-07-24 08:50:44.828164 7f011392a700 0 -- 10.255.0.82:6802/29397 >>
10.255.0.100:6789/0 pipe(0x41ecf00 sd=281 :0 s=1 pgs=0 cs=0 l=1).fault
2013-07-24 08:51:02.829357 7f00f1697700 0 -- 10.255.0.82:6802/29397 >>
10.255.0.100:6789/0 pipe(0x1d8b180 sd=281 :0 s=1 pgs=0 cs=0 l=1).fault
Stefan
next reply other threads:[~2013-07-24 7:05 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-24 7:05 Stefan Priebe - Profihost AG [this message]
2013-07-24 7:37 ` Upgrading from 0.61.5 to 0.61.6 ended in disaster Stefan Priebe - Profihost AG
2013-07-24 10:42 ` Joao Eduardo Luis
2013-07-24 11:11 ` Joao Eduardo Luis
2013-07-24 11:54 ` Stefan Priebe - Profihost AG
2013-07-24 15:29 ` Sage Weil
2013-07-24 23:19 ` Sage Weil
2013-07-25 6:19 ` Stefan Priebe - Profihost AG
-- strict thread matches above, loose matches on Subject: below --
2013-07-25 11:19 peter
2013-07-25 15:46 ` Sage Weil
2013-07-25 16:12 ` peter
2013-07-29 9:40 ` peter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51EF7CC8.9070507@profihost.ag \
--to=s.priebe@profihost.ag \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.