* Assert in OSDMap::Incremental::decode with recent git
@ 2011-11-15 11:37 Josh Pieper
2011-11-15 11:55 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Josh Pieper @ 2011-11-15 11:37 UTC (permalink / raw)
To: ceph-devel
Hello,
I was trying to test http://tracker.newdream.net/issues/1708, but in
the process of attempting to do so, keep getting asserts in one of my
monitors.
Pretty much all I have done is bring up a simple 3 node cluster, each
with a mon, osd, and mds. All are using amd64 ubuntu 11.10, with git
2e195500. Once the cluster is up, I have a rbd load applied which is
a single VM reading large files. Three times now, I have gotten the
same assert out of mon, inlined below.
Thoughts?
Regards,
Josh
-----------------------------
2011-11-15 06:22:01.912870 7feb50cc37a0 ceph version 0.38-181-g2e19550 (commit:2e195500b5d3a8ab8512bcf2a219a6b7ff922c97), process ceph-mon, pid 29906
2011-11-15 06:22:01.912955 7feb50cc37a0 store(/data/mon2) mount
2011-11-15 06:22:01.971112 7feb50cc37a0 mon.2@2(probing) e1 init fsid eeca00c2-c99c-40d4-9458-5e7ce7dd648c
2011-11-15 06:22:17.597117 7feb4cdd1700 log [INF] : mon.2 calling new monitor election
2011-11-15 06:22:31.034562 7feb4cdd1700 log [INF] : mon.2 calling new monitor election
*** Caught signal (Aborted) **
in thread 7feb4cdd1700
ceph version 0.38-181-g2e19550 (commit:2e195500b5d3a8ab8512bcf2a219a6b7ff922c97)
1: /usr/bin/ceph-mon() [0x5c2fa6]
2: (()+0x10060) [0x7feb508a4060]
3: (gsignal()+0x35) [0x7feb4f0253a5]
4: (abort()+0x17b) [0x7feb4f028b0b]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7feb4f8e3d7d]
6: (()+0xb9f26) [0x7feb4f8e1f26]
7: (()+0xb9f53) [0x7feb4f8e1f53]
8: (()+0xba04e) [0x7feb4f8e204e]
9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) [0x596237]
10: (OSDMap::Incremental::decode(ceph::buffer::list::iterator&)+0x3f) [0x573b6f]
11: (OSDMonitor::update_from_paxos()+0x7b0) [0x49a9c0]
12: (PaxosService::_active()+0x39) [0x4933f9]
13: (Context::complete(int)+0xa) [0x47c12a]
14: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xca) [0x47da8a]
15: (Paxos::handle_lease(MMonPaxos*)+0x36d) [0x48aafd]
16: (Paxos::dispatch(PaxosServiceMessage*)+0x21b) [0x48f4db]
17: (Monitor::_ms_dispatch(Message*)+0xcbf) [0x47b66f]
18: (Monitor::ms_dispatch(Message*)+0x35) [0x486425]
19: (SimpleMessenger::dispatch_entry()+0x84b) [0x583e8b]
20: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x46612c]
21: (()+0x7efc) [0x7feb5089befc]
22: (clone()+0x6d) [0x7feb4f0d089d]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Assert in OSDMap::Incremental::decode with recent git
2011-11-15 11:37 Assert in OSDMap::Incremental::decode with recent git Josh Pieper
@ 2011-11-15 11:55 ` Christoph Hellwig
2011-11-15 19:44 ` Gregory Farnum
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2011-11-15 11:55 UTC (permalink / raw)
To: Josh Pieper; +Cc: ceph-devel
On Tue, Nov 15, 2011 at 06:37:47AM -0500, Josh Pieper wrote:
> Hello,
>
> I was trying to test http://tracker.newdream.net/issues/1708, but in
> the process of attempting to do so, keep getting asserts in one of my
> monitors.
>
> Pretty much all I have done is bring up a simple 3 node cluster, each
> with a mon, osd, and mds. All are using amd64 ubuntu 11.10, with git
> 2e195500. Once the cluster is up, I have a rbd load applied which is
> a single VM reading large files. Three times now, I have gotten the
> same assert out of mon, inlined below.
I hit the same when trying to bring up a test cluster on a single
physical machine. As soon as moved to vstart.sh I couldn't reproduce
it anymore.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Assert in OSDMap::Incremental::decode with recent git
2011-11-15 11:55 ` Christoph Hellwig
@ 2011-11-15 19:44 ` Gregory Farnum
2011-11-16 11:16 ` Josh Pieper
0 siblings, 1 reply; 5+ messages in thread
From: Gregory Farnum @ 2011-11-15 19:44 UTC (permalink / raw)
To: ceph-devel; +Cc: Josh Pieper, Christoph Hellwig
On Tue, Nov 15, 2011 at 3:55 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Tue, Nov 15, 2011 at 06:37:47AM -0500, Josh Pieper wrote:
>> Hello,
>>
>> I was trying to test http://tracker.newdream.net/issues/1708, but in
>> the process of attempting to do so, keep getting asserts in one of my
>> monitors.
>>
>> Pretty much all I have done is bring up a simple 3 node cluster, each
>> with a mon, osd, and mds. All are using amd64 ubuntu 11.10, with git
>> 2e195500. Once the cluster is up, I have a rbd load applied which is
>> a single VM reading large files. Three times now, I have gotten the
>> same assert out of mon, inlined below.
>
> I hit the same when trying to bring up a test cluster on a single
> physical machine. As soon as moved to vstart.sh I couldn't reproduce
> it anymore.
Hmm, interesting that it doesn't happen on vstart, since that's
supposed to use the new mon bootstrapping pieces as well.
Josh, can you turn up monitor debugging and send me the log/post it
somewhere? Presumably the big refactor Sage referred to broke
something here.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Assert in OSDMap::Incremental::decode with recent git
2011-11-15 19:44 ` Gregory Farnum
@ 2011-11-16 11:16 ` Josh Pieper
2011-11-16 19:14 ` Sage Weil
0 siblings, 1 reply; 5+ messages in thread
From: Josh Pieper @ 2011-11-16 11:16 UTC (permalink / raw)
To: ceph-devel; +Cc: Christoph Hellwig, Gregory Farnum
Gregory Farnum wrote:
> On Tue, Nov 15, 2011 at 3:55 AM, Christoph Hellwig <hch@infradead.org> wrote:
> > I hit the same when trying to bring up a test cluster on a single
> > physical machine. As soon as moved to vstart.sh I couldn't reproduce
> > it anymore.
>
> Hmm, interesting that it doesn't happen on vstart, since that's
> supposed to use the new mon bootstrapping pieces as well.
>
> Josh, can you turn up monitor debugging and send me the log/post it
> somewhere? Presumably the big refactor Sage referred to broke
> something here.
http://joshp.no-ip.com:8080/20111116-ceph-mon.2.log
I've inlined a snippet from the end.
-Josh
2011-11-16 06:07:13.828322 7fce87736700 -- 192.168.122.74:6789/0 <== mon.0 192.168.122.95:6789/0 30 ==== paxos(osdmap lease lc 144 fc 142 pn 0 opn 0) v1 ==== 84+0+0 (3125715891 0 0) 0x17f2900 con 0x1735780
2011-11-16 06:07:13.828329 7fce87736700 mon.2@2(peon) e1 have connection
2011-11-16 06:07:13.828336 7fce87736700 mon.2@2(peon) e1 ms_dispatch existing session MonSession: mon.0 192.168.122.95:6789/0 is openallow * for mon.0 192.168.122.95:6789/0
2011-11-16 06:07:13.828340 7fce87736700 mon.2@2(peon) e1 caps allow *
2011-11-16 06:07:13.828347 7fce87736700 mon.2@2(peon).paxos(osdmap active c 141..144) handle_lease on 144 now 2011-11-16 06:07:17.183529
2011-11-16 06:07:13.828356 7fce87736700 -- 192.168.122.74:6789/0 --> mon.0 192.168.122.95:6789/0 -- paxos(osdmap lease_ack lc 144 fc 141 pn 0 opn 0) v1 -- ?+0 0x17f2b40
2011-11-16 06:07:13.828471 7fce87736700 mon.2@2(peon).paxos(osdmap active c 141..144) trim_to 142 (was 141), latest_stashed 141
2011-11-16 06:07:13.828485 7fce87736700 store(/data/mon2) set_int osdmap/first_committed = 141
2011-11-16 06:07:14.126377 7fce86431700 mon.2@2(peon) e1 ms_verify_authorizer 192.168.122.1:0/1024415 client protocol 0
2011-11-16 06:07:18.370639 7fce87736700 mon.2@2(peon).paxosservice(osdmap) _active
2011-11-16 06:07:18.370657 7fce87736700 mon.2@2(peon).osd e130 update_from_paxos paxos e 144, my e 130
2011-11-16 06:07:18.370691 7fce87736700 store(/data/mon2) get_bl osdmap/131 No such file or directory
2011-11-16 06:07:18.370698 7fce87736700 mon.2@2(peon).osd e130 update_from_paxos applying incremental 131
*** Caught signal (Aborted) **
in thread 7fce87736700
ceph version 0.38-181-g2e19550 (commit:2e195500b5d3a8ab8512bcf2a219a6b7ff922c97)
1: /usr/bin/ceph-mon() [0x5c2fa6]
2: (()+0x10060) [0x7fce8b209060]
3: (gsignal()+0x35) [0x7fce8998a3a5]
4: (abort()+0x17b) [0x7fce8998db0b]
5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fce8a248d7d]
6: (()+0xb9f26) [0x7fce8a246f26]
7: (()+0xb9f53) [0x7fce8a246f53]
8: (()+0xba04e) [0x7fce8a24704e]
9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) [0x596237]
10: (OSDMap::Incremental::decode(ceph::buffer::list::iterator&)+0x3f) [0x573b6f]
11: (OSDMonitor::update_from_paxos()+0x7b0) [0x49a9c0]
12: (PaxosService::_active()+0x39) [0x4933f9]
13: (Context::complete(int)+0xa) [0x47c12a]
14: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xca) [0x47da8a]
15: (Paxos::handle_lease(MMonPaxos*)+0x36d) [0x48aafd]
16: (Paxos::dispatch(PaxosServiceMessage*)+0x21b) [0x48f4db]
17: (Monitor::_ms_dispatch(Message*)+0xcbf) [0x47b66f]
18: (Monitor::ms_dispatch(Message*)+0x35) [0x486425]
19: (SimpleMessenger::dispatch_entry()+0x84b) [0x583e8b]
20: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x46612c]
21: (()+0x7efc) [0x7fce8b200efc]
22: (clone()+0x6d) [0x7fce89a3589d]
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Assert in OSDMap::Incremental::decode with recent git
2011-11-16 11:16 ` Josh Pieper
@ 2011-11-16 19:14 ` Sage Weil
0 siblings, 0 replies; 5+ messages in thread
From: Sage Weil @ 2011-11-16 19:14 UTC (permalink / raw)
To: Josh Pieper; +Cc: ceph-devel, Christoph Hellwig, Gregory Farnum
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3979 bytes --]
On Wed, 16 Nov 2011, Josh Pieper wrote:
> Gregory Farnum wrote:
> > On Tue, Nov 15, 2011 at 3:55 AM, Christoph Hellwig <hch@infradead.org> wrote:
> > > I hit the same when trying to bring up a test cluster on a single
> > > physical machine. As soon as moved to vstart.sh I couldn't reproduce
> > > it anymore.
> >
> > Hmm, interesting that it doesn't happen on vstart, since that's
> > supposed to use the new mon bootstrapping pieces as well.
> >
> > Josh, can you turn up monitor debugging and send me the log/post it
> > somewhere? Presumably the big refactor Sage referred to broke
> > something here.
>
> http://joshp.no-ip.com:8080/20111116-ceph-mon.2.log
>
> I've inlined a snippet from the end.
Thanks! I've pushed a fix to master.
sage
>
> -Josh
>
> 2011-11-16 06:07:13.828322 7fce87736700 -- 192.168.122.74:6789/0 <== mon.0 192.168.122.95:6789/0 30 ==== paxos(osdmap lease lc 144 fc 142 pn 0 opn 0) v1 ==== 84+0+0 (3125715891 0 0) 0x17f2900 con 0x1735780
> 2011-11-16 06:07:13.828329 7fce87736700 mon.2@2(peon) e1 have connection
> 2011-11-16 06:07:13.828336 7fce87736700 mon.2@2(peon) e1 ms_dispatch existing session MonSession: mon.0 192.168.122.95:6789/0 is openallow * for mon.0 192.168.122.95:6789/0
> 2011-11-16 06:07:13.828340 7fce87736700 mon.2@2(peon) e1 caps allow *
> 2011-11-16 06:07:13.828347 7fce87736700 mon.2@2(peon).paxos(osdmap active c 141..144) handle_lease on 144 now 2011-11-16 06:07:17.183529
> 2011-11-16 06:07:13.828356 7fce87736700 -- 192.168.122.74:6789/0 --> mon.0 192.168.122.95:6789/0 -- paxos(osdmap lease_ack lc 144 fc 141 pn 0 opn 0) v1 -- ?+0 0x17f2b40
> 2011-11-16 06:07:13.828471 7fce87736700 mon.2@2(peon).paxos(osdmap active c 141..144) trim_to 142 (was 141), latest_stashed 141
> 2011-11-16 06:07:13.828485 7fce87736700 store(/data/mon2) set_int osdmap/first_committed = 141
> 2011-11-16 06:07:14.126377 7fce86431700 mon.2@2(peon) e1 ms_verify_authorizer 192.168.122.1:0/1024415 client protocol 0
> 2011-11-16 06:07:18.370639 7fce87736700 mon.2@2(peon).paxosservice(osdmap) _active
> 2011-11-16 06:07:18.370657 7fce87736700 mon.2@2(peon).osd e130 update_from_paxos paxos e 144, my e 130
> 2011-11-16 06:07:18.370691 7fce87736700 store(/data/mon2) get_bl osdmap/131 No such file or directory
> 2011-11-16 06:07:18.370698 7fce87736700 mon.2@2(peon).osd e130 update_from_paxos applying incremental 131
> *** Caught signal (Aborted) **
> in thread 7fce87736700
> ceph version 0.38-181-g2e19550 (commit:2e195500b5d3a8ab8512bcf2a219a6b7ff922c97)
> 1: /usr/bin/ceph-mon() [0x5c2fa6]
> 2: (()+0x10060) [0x7fce8b209060]
> 3: (gsignal()+0x35) [0x7fce8998a3a5]
> 4: (abort()+0x17b) [0x7fce8998db0b]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fce8a248d7d]
> 6: (()+0xb9f26) [0x7fce8a246f26]
> 7: (()+0xb9f53) [0x7fce8a246f53]
> 8: (()+0xba04e) [0x7fce8a24704e]
> 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) [0x596237]
> 10: (OSDMap::Incremental::decode(ceph::buffer::list::iterator&)+0x3f) [0x573b6f]
> 11: (OSDMonitor::update_from_paxos()+0x7b0) [0x49a9c0]
> 12: (PaxosService::_active()+0x39) [0x4933f9]
> 13: (Context::complete(int)+0xa) [0x47c12a]
> 14: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0xca) [0x47da8a]
> 15: (Paxos::handle_lease(MMonPaxos*)+0x36d) [0x48aafd]
> 16: (Paxos::dispatch(PaxosServiceMessage*)+0x21b) [0x48f4db]
> 17: (Monitor::_ms_dispatch(Message*)+0xcbf) [0x47b66f]
> 18: (Monitor::ms_dispatch(Message*)+0x35) [0x486425]
> 19: (SimpleMessenger::dispatch_entry()+0x84b) [0x583e8b]
> 20: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x46612c]
> 21: (()+0x7efc) [0x7fce8b200efc]
> 22: (clone()+0x6d) [0x7fce89a3589d]
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-11-16 19:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-15 11:37 Assert in OSDMap::Incremental::decode with recent git Josh Pieper
2011-11-15 11:55 ` Christoph Hellwig
2011-11-15 19:44 ` Gregory Farnum
2011-11-16 11:16 ` Josh Pieper
2011-11-16 19:14 ` Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.