* OSD crashed today in os/JournalingObjectStore.cc
@ 2012-12-05 9:56 Stefan Priebe - Profihost AG
2012-12-05 14:41 ` Sage Weil
0 siblings, 1 reply; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-12-05 9:56 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 166 bytes --]
Hello list,
i updated to latest next from today and then after 20 minutes an OSD was
crashing in os/JournalingObjectStore.cc.
Attached is the log.
Greets,
Stefan
[-- Attachment #2: ceph-osd.43.log --]
[-- Type: text/x-log, Size: 14643 bytes --]
2012-12-05 10:21:12.591166 7f57aeeb9700 0 monclient: hunting for new mon
2012-12-05 10:21:14.338644 7f578e966700 0 -- 10.255.0.103:6807/15121 >> 10.255.0.100:6802/28708 pipe(0xe061000 sd=67 :34107 pgs=50 cs=13 l=0).fault with nothing to send, going to standby
2012-12-05 10:21:14.338786 7f57c6368700 0 -- 10.255.0.103:0/15121 >> 10.255.0.100:6803/28708 pipe(0xd56e900 sd=28 :0 pgs=0 cs=0 l=1).fault
2012-12-05 10:21:15.748915 7f578eb68700 0 -- 10.255.0.103:6807/15121 >> 10.255.0.100:6808/29075 pipe(0xddd1480 sd=74 :6807 pgs=46 cs=27 l=0).fault with nothing to send, going to standby
2012-12-05 10:21:15.749020 7f578c23f700 0 -- 10.255.0.103:0/15121 >> 10.255.0.100:6809/29075 pipe(0xc96b6c0 sd=47 :0 pgs=0 cs=0 l=1).fault
2012-12-05 10:21:17.029751 7f5789f06700 0 -- 10.255.0.103:6807/15121 >> 10.255.0.100:6811/29438 pipe(0x11ed56c0 sd=75 :6807 pgs=76 cs=21 l=0).fault with nothing to send, going to standby
2012-12-05 10:21:17.029925 7f578be3b700 0 -- 10.255.0.103:0/15121 >> 10.255.0.100:6814/29438 pipe(0xcf876c0 sd=55 :0 pgs=0 cs=0 l=1).fault
2012-12-05 10:21:18.334263 7f578fa77700 0 -- 10.255.0.103:6807/15121 >> 10.255.0.100:6819/29801 pipe(0xd0bb480 sd=79 :6807 pgs=85 cs=43 l=0).fault with nothing to send, going to standby
2012-12-05 10:21:18.334403 7f578a007700 0 -- 10.255.0.103:0/15121 >> 10.255.0.100:6821/29801 pipe(0x12024b40 sd=28 :0 pgs=0 cs=0 l=1).fault
2012-12-05 10:21:20.375215 7f578fb78700 0 -- 10.255.0.103:6807/15121 >> 10.255.0.101:6801/8284 pipe(0xdb0ed80 sd=42 :6807 pgs=39 cs=9 l=0).fault with nothing to send, going to standby
2012-12-05 10:21:20.375381 7f578be3b700 0 -- 10.255.0.103:0/15121 >> 10.255.0.101:6802/8284 pipe(0x100656c0 sd=59 :0 pgs=0 cs=0 l=1).fault
2012-12-05 10:21:22.637693 7f5789a01700 0 -- 10.255.0.103:6807/15121 >> 10.255.0.101:6804/8467 pipe(0x13a23d80 sd=77 :6807 pgs=182 cs=15 l=0).fault with nothing to send, going to standby
2012-12-05 10:21:22.637861 7f578f976700 0 -- 10.255.0.103:0/15121 >> 10.255.0.101:6805/8467 pipe(0xd2dcb40 sd=28 :0 pgs=0 cs=0 l=1).fault
2012-12-05 10:21:24.777204 7f578a108700 0 -- 10.255.0.103:6807/15121 >> 10.255.0.101:6807/8647 pipe(0xd8eeb40 sd=40 :6807 pgs=257 cs=29 l=0).fault with nothing to send, going to standby
2012-12-05 10:21:24.777420 7f578b431700 0 -- 10.255.0.103:0/15121 >> 10.255.0.101:6808/8647 pipe(0xceb3900 sd=74 :0 pgs=0 cs=0 l=1).fault
2012-12-05 10:21:26.870074 7f578f16e700 0 -- 10.255.0.103:6807/15121 >> 10.255.0.101:6810/8877 pipe(0x114a56c0 sd=72 :6807 pgs=200 cs=13 l=0).fault with nothing to send, going to standby
2012-12-05 10:21:26.870281 7f578ce4b700 0 -- 10.255.0.103:0/15121 >> 10.255.0.101:6811/8877 pipe(0xceb3480 sd=51 :0 pgs=0 cs=0 l=1).fault
2012-12-05 10:21:28.977016 7f578f471700 0 -- 10.255.0.103:6807/15121 >> 10.255.0.102:6801/6127 pipe(0xd8ee900 sd=38 :6807 pgs=178 cs=15 l=0).fault with nothing to send, going to standby
2012-12-05 10:21:28.977174 7f578db58700 0 -- 10.255.0.103:0/15121 >> 10.255.0.102:6802/6127 pipe(0xceb36c0 sd=40 :0 pgs=0 cs=0 l=1).fault
2012-12-05 10:21:31.091973 7f578f370700 0 -- 10.255.0.103:6807/15121 >> 10.255.0.102:6806/6308 pipe(0xc96cd80 sd=36 :6807 pgs=260 cs=1 l=0).fault with nothing to send, going to standby
2012-12-05 10:21:31.092196 7f578f16e700 0 -- 10.255.0.103:0/15121 >> 10.255.0.102:6807/6308 pipe(0xdbbc6c0 sd=31 :0 pgs=0 cs=0 l=1).fault
2012-12-05 10:21:33.200579 7f578f26f700 0 -- 10.255.0.103:6807/15121 >> 10.255.0.102:6809/6491 pipe(0xc96cb40 sd=35 :6807 pgs=261 cs=1 l=0).fault with nothing to send, going to standby
2012-12-05 10:21:33.200853 7f578f471700 0 -- 10.255.0.103:0/15121 >> 10.255.0.102:6810/6491 pipe(0xe1cf480 sd=38 :0 pgs=0 cs=0 l=1).fault
2012-12-05 10:21:35.329384 7f578a70e700 0 -- 10.255.0.103:6807/15121 >> 10.255.0.102:6822/6670 pipe(0xfad4b40 sd=70 :6807 pgs=319 cs=9 l=0).fault with nothing to send, going to standby
2012-12-05 10:21:35.329523 7f578d754700 0 -- 10.255.0.103:0/15121 >> 10.255.0.102:6823/6670 pipe(0xfad4240 sd=72 :0 pgs=0 cs=0 l=1).fault
2012-12-05 10:21:42.031928 7f57c26e0700 -1 osd.43 923 *** Got signal Terminated ***
2012-12-05 10:21:42.032002 7f57c26e0700 -1 osd.43 923 pausing thread pools
2012-12-05 10:21:42.032007 7f57c26e0700 -1 osd.43 923 flushing io
2012-12-05 10:21:42.032015 7f57c26e0700 -1 osd.43 923 removing pid file
2012-12-05 10:21:42.032092 7f57c26e0700 -1 osd.43 923 exit
2012-12-05 10:21:43.608251 7fd046962780 0 filestore(/ceph/osd.43/) mount FIEMAP ioctl is supported and appears to work
2012-12-05 10:21:43.608262 7fd046962780 0 filestore(/ceph/osd.43/) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2012-12-05 10:21:43.608495 7fd046962780 0 filestore(/ceph/osd.43/) mount did NOT detect btrfs
2012-12-05 10:21:43.613072 7fd046962780 0 filestore(/ceph/osd.43/) mount syscall(__NR_syncfs, fd) fully supported
2012-12-05 10:21:43.613151 7fd046962780 0 filestore(/ceph/osd.43/) mount found snaps <>
2012-12-05 10:21:43.615479 7fd046962780 0 filestore(/ceph/osd.43/) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2012-12-05 10:21:43.638102 7fd046962780 0 journal kernel version is 3.6.7
2012-12-05 10:21:43.768129 7fd046962780 0 journal kernel version is 3.6.7
2012-12-05 10:21:43.819826 7fd046962780 0 filestore(/ceph/osd.43/) mount FIEMAP ioctl is supported and appears to work
2012-12-05 10:21:43.819835 7fd046962780 0 filestore(/ceph/osd.43/) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2012-12-05 10:21:43.820065 7fd046962780 0 filestore(/ceph/osd.43/) mount did NOT detect btrfs
2012-12-05 10:21:43.821567 7fd046962780 0 filestore(/ceph/osd.43/) mount syscall(__NR_syncfs, fd) fully supported
2012-12-05 10:21:43.821622 7fd046962780 0 filestore(/ceph/osd.43/) mount found snaps <>
2012-12-05 10:21:43.822791 7fd046962780 0 filestore(/ceph/osd.43/) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2012-12-05 10:21:43.837954 7fd046962780 0 journal kernel version is 3.6.7
2012-12-05 10:21:43.898018 7fd046962780 0 journal kernel version is 3.6.7
2012-12-05 10:46:40.709056 7fd03c4b6700 -1 os/JournalingObjectStore.cc: In function 'uint64_t JournalingObjectStore::ApplyManager::op_apply_start(uint64_t)' thread 7fd03c4b6700 time 2012-12-05 10:46:40.338489
os/JournalingObjectStore.cc: 134: FAILED assert(op > committed_seq)
ceph version 0.55-142-g22f794d (22f794da074dd1b3221c484a5ae05b2ff1bd0fa4)
1: (JournalingObjectStore::ApplyManager::op_apply_start(unsigned long)+0x816) [0x747626]
2: (FileStore::_do_op(FileStore::OpSequencer*)+0x52) [0x703c22]
3: (ThreadPool::worker(ThreadPool::WorkThread*)+0x82b) [0x82f81b]
4: (ThreadPool::WorkThread::entry()+0x10) [0x832000]
5: (()+0x68ca) [0x7fd04633f8ca]
6: (clone()+0x6d) [0x7fd0447aeb6d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
-29> 2012-12-05 10:21:43.592318 7fd046962780 5 asok(0x244b000) register_command perfcounters_dump hook 0x243f010
-28> 2012-12-05 10:21:43.592340 7fd046962780 5 asok(0x244b000) register_command 1 hook 0x243f010
-27> 2012-12-05 10:21:43.592342 7fd046962780 5 asok(0x244b000) register_command perf dump hook 0x243f010
-26> 2012-12-05 10:21:43.592350 7fd046962780 5 asok(0x244b000) register_command perfcounters_schema hook 0x243f010
-25> 2012-12-05 10:21:43.592354 7fd046962780 5 asok(0x244b000) register_command 2 hook 0x243f010
-24> 2012-12-05 10:21:43.592357 7fd046962780 5 asok(0x244b000) register_command perf schema hook 0x243f010
-23> 2012-12-05 10:21:43.592359 7fd046962780 5 asok(0x244b000) register_command config show hook 0x243f010
-22> 2012-12-05 10:21:43.592361 7fd046962780 5 asok(0x244b000) register_command config set hook 0x243f010
-21> 2012-12-05 10:21:43.592363 7fd046962780 5 asok(0x244b000) register_command log flush hook 0x243f010
-20> 2012-12-05 10:21:43.592365 7fd046962780 5 asok(0x244b000) register_command log dump hook 0x243f010
-19> 2012-12-05 10:21:43.592367 7fd046962780 5 asok(0x244b000) register_command log reopen hook 0x243f010
-18> 2012-12-05 10:21:43.594773 7fd046962780 0 ceph version 0.55-142-g22f794d (22f794da074dd1b3221c484a5ae05b2ff1bd0fa4), process ceph-osd, pid 31785
-17> 2012-12-05 10:21:43.595944 7fd046962780 1 finished global_init_daemonize
-16> 2012-12-05 10:21:43.608251 7fd046962780 0 filestore(/ceph/osd.43/) mount FIEMAP ioctl is supported and appears to work
-15> 2012-12-05 10:21:43.608262 7fd046962780 0 filestore(/ceph/osd.43/) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
-14> 2012-12-05 10:21:43.608495 7fd046962780 0 filestore(/ceph/osd.43/) mount did NOT detect btrfs
-13> 2012-12-05 10:21:43.613072 7fd046962780 0 filestore(/ceph/osd.43/) mount syscall(__NR_syncfs, fd) fully supported
-12> 2012-12-05 10:21:43.613151 7fd046962780 0 filestore(/ceph/osd.43/) mount found snaps <>
-11> 2012-12-05 10:21:43.615479 7fd046962780 0 filestore(/ceph/osd.43/) mount: enabling WRITEAHEAD journal mode: btrfs not detected
-10> 2012-12-05 10:21:43.638102 7fd046962780 0 journal kernel version is 3.6.7
-9> 2012-12-05 10:21:43.768129 7fd046962780 0 journal kernel version is 3.6.7
-8> 2012-12-05 10:21:43.819826 7fd046962780 0 filestore(/ceph/osd.43/) mount FIEMAP ioctl is supported and appears to work
-7> 2012-12-05 10:21:43.819835 7fd046962780 0 filestore(/ceph/osd.43/) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
-6> 2012-12-05 10:21:43.820065 7fd046962780 0 filestore(/ceph/osd.43/) mount did NOT detect btrfs
-5> 2012-12-05 10:21:43.821567 7fd046962780 0 filestore(/ceph/osd.43/) mount syscall(__NR_syncfs, fd) fully supported
-4> 2012-12-05 10:21:43.821622 7fd046962780 0 filestore(/ceph/osd.43/) mount found snaps <>
-3> 2012-12-05 10:21:43.822791 7fd046962780 0 filestore(/ceph/osd.43/) mount: enabling WRITEAHEAD journal mode: btrfs not detected
-2> 2012-12-05 10:21:43.837954 7fd046962780 0 journal kernel version is 3.6.7
-1> 2012-12-05 10:21:43.898018 7fd046962780 0 journal kernel version is 3.6.7
0> 2012-12-05 10:46:40.709056 7fd03c4b6700 -1 os/JournalingObjectStore.cc: In function 'uint64_t JournalingObjectStore::ApplyManager::op_apply_start(uint64_t)' thread 7fd03c4b6700 time 2012-12-05 10:46:40.338489
os/JournalingObjectStore.cc: 134: FAILED assert(op > committed_seq)
ceph version 0.55-142-g22f794d (22f794da074dd1b3221c484a5ae05b2ff1bd0fa4)
1: (JournalingObjectStore::ApplyManager::op_apply_start(unsigned long)+0x816) [0x747626]
2: (FileStore::_do_op(FileStore::OpSequencer*)+0x52) [0x703c22]
3: (ThreadPool::worker(ThreadPool::WorkThread*)+0x82b) [0x82f81b]
4: (ThreadPool::WorkThread::entry()+0x10) [0x832000]
5: (()+0x68ca) [0x7fd04633f8ca]
6: (clone()+0x6d) [0x7fd0447aeb6d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 0 lockdep
0/ 0 context
0/ 0 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 0 buffer
0/ 0 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 0 journaler
0/ 5 objectcacher
0/ 5 client
0/ 0 osd
0/ 0 optracker
0/ 0 objclass
0/ 0 filestore
0/ 0 journal
0/ 0 ms
1/ 5 mon
0/ 0 monc
0/ 5 paxos
0/ 0 tp
0/ 0 auth
1/ 5 crypto
0/ 0 finisher
0/ 0 heartbeatmap
0/ 0 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
0/ 0 asok
0/ 0 throttle
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 100000
max_new 1000
log_file /var/log/ceph/ceph-osd.43.log
--- end dump of recent events ---
2012-12-05 10:46:40.710600 7fd03c4b6700 -1 *** Caught signal (Aborted) **
in thread 7fd03c4b6700
ceph version 0.55-142-g22f794d (22f794da074dd1b3221c484a5ae05b2ff1bd0fa4)
1: /usr/bin/ceph-osd() [0x797bd9]
2: (()+0xeff0) [0x7fd046347ff0]
3: (gsignal()+0x35) [0x7fd0447111b5]
4: (abort()+0x180) [0x7fd044713fc0]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd044fa5dc5]
6: (()+0xcb166) [0x7fd044fa4166]
7: (()+0xcb193) [0x7fd044fa4193]
8: (()+0xcb28e) [0x7fd044fa428e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x7fb939]
10: (JournalingObjectStore::ApplyManager::op_apply_start(unsigned long)+0x816) [0x747626]
11: (FileStore::_do_op(FileStore::OpSequencer*)+0x52) [0x703c22]
12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x82b) [0x82f81b]
13: (ThreadPool::WorkThread::entry()+0x10) [0x832000]
14: (()+0x68ca) [0x7fd04633f8ca]
15: (clone()+0x6d) [0x7fd0447aeb6d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
0> 2012-12-05 10:46:40.710600 7fd03c4b6700 -1 *** Caught signal (Aborted) **
in thread 7fd03c4b6700
ceph version 0.55-142-g22f794d (22f794da074dd1b3221c484a5ae05b2ff1bd0fa4)
1: /usr/bin/ceph-osd() [0x797bd9]
2: (()+0xeff0) [0x7fd046347ff0]
3: (gsignal()+0x35) [0x7fd0447111b5]
4: (abort()+0x180) [0x7fd044713fc0]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd044fa5dc5]
6: (()+0xcb166) [0x7fd044fa4166]
7: (()+0xcb193) [0x7fd044fa4193]
8: (()+0xcb28e) [0x7fd044fa428e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x7fb939]
10: (JournalingObjectStore::ApplyManager::op_apply_start(unsigned long)+0x816) [0x747626]
11: (FileStore::_do_op(FileStore::OpSequencer*)+0x52) [0x703c22]
12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x82b) [0x82f81b]
13: (ThreadPool::WorkThread::entry()+0x10) [0x832000]
14: (()+0x68ca) [0x7fd04633f8ca]
15: (clone()+0x6d) [0x7fd0447aeb6d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 0 lockdep
0/ 0 context
0/ 0 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 0 buffer
0/ 0 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 0 journaler
0/ 5 objectcacher
0/ 5 client
0/ 0 osd
0/ 0 optracker
0/ 0 objclass
0/ 0 filestore
0/ 0 journal
0/ 0 ms
1/ 5 mon
0/ 0 monc
0/ 5 paxos
0/ 0 tp
0/ 0 auth
1/ 5 crypto
0/ 0 finisher
0/ 0 heartbeatmap
0/ 0 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
0/ 0 asok
0/ 0 throttle
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 100000
max_new 1000
log_file /var/log/ceph/ceph-osd.43.log
--- end dump of recent events ---
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: OSD crashed today in os/JournalingObjectStore.cc
2012-12-05 9:56 OSD crashed today in os/JournalingObjectStore.cc Stefan Priebe - Profihost AG
@ 2012-12-05 14:41 ` Sage Weil
2012-12-05 16:05 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 13+ messages in thread
From: Sage Weil @ 2012-12-05 14:41 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org
On Wed, 5 Dec 2012, Stefan Priebe - Profihost AG wrote:
> Hello list,
>
> i updated to latest next from today and then after 20 minutes an OSD was
> crashing in os/JournalingObjectStore.cc.
>
> Attached is the log.
Hmm, this is perplexing. It might just be a bad assert, but I can't see
how it could happen. Any chance you can reproduce with
debug journal = 0/10
in the [osd] section? That will give us a dump if it fails the assert.
Thanks!
s
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: OSD crashed today in os/JournalingObjectStore.cc
2012-12-05 14:41 ` Sage Weil
@ 2012-12-05 16:05 ` Stefan Priebe - Profihost AG
2012-12-05 22:25 ` Stefan Priebe
2012-12-05 23:36 ` Sage Weil
0 siblings, 2 replies; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-12-05 16:05 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel@vger.kernel.org
There was a dump in the attached log.
Stefan
Am 05.12.2012 um 15:41 schrieb Sage Weil <sage@inktank.com>:
> On Wed, 5 Dec 2012, Stefan Priebe - Profihost AG wrote:
>> Hello list,
>>
>> i updated to latest next from today and then after 20 minutes an OSD was
>> crashing in os/JournalingObjectStore.cc.
>>
>> Attached is the log.
>
> Hmm, this is perplexing. It might just be a bad assert, but I can't see
> how it could happen. Any chance you can reproduce with
>
> debug journal = 0/10
>
> in the [osd] section? That will give us a dump if it fails the assert.
>
> Thanks!
> s
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: OSD crashed today in os/JournalingObjectStore.cc
2012-12-05 16:05 ` Stefan Priebe - Profihost AG
@ 2012-12-05 22:25 ` Stefan Priebe
2012-12-05 22:29 ` Stefan Priebe
2012-12-05 23:36 ` Sage Weil
1 sibling, 1 reply; 13+ messages in thread
From: Stefan Priebe @ 2012-12-05 22:25 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel@vger.kernel.org
Hello,
i had now 8 OSDs failing again with the same error.
0> 2012-12-05 23:10:41.213149 7f7fad109700 -1
os/JournalingObjectStore.cc: In function 'uint64_t
JournalingObjectStore::ApplyManager::op_apply_start(uint64_t)' thread
7f7fad109700 time 2012-12-05 23:10:41.212454
os/JournalingObjectStore.cc: 134: FAILED assert(op > committed_seq)
ceph version 0.55-142-g22f794d (22f794da074dd1b3221c484a5ae05b2ff1bd0fa4)
1: (JournalingObjectStore::ApplyManager::op_apply_start(unsigned
long)+0x816) [0x747626]
2: (FileStore::_do_op(FileStore::OpSequencer*)+0x52) [0x703c22]
3: (ThreadPool::worker(ThreadPool::WorkThread*)+0x82b) [0x82f81b]
4: (ThreadPool::WorkThread::entry()+0x10) [0x832000]
5: (()+0x68ca) [0x7f7fc17a78ca]
6: (clone()+0x6d) [0x7f7fbfc16bfd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 0 lockdep
0/ 0 context
0/ 0 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 0 buffer
0/ 0 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 0 journaler
0/ 5 objectcacher
0/ 5 client
0/ 0 osd
0/ 0 optracker
0/ 0 objclass
0/ 0 filestore
0/ 0 journal
0/ 0 ms
1/ 5 mon
0/ 0 monc
0/ 5 paxos
0/ 0 tp
0/ 0 auth
1/ 5 crypto
0/ 0 finisher
0/ 0 heartbeatmap
0/ 0 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
0/ 0 asok
0/ 0 throttle
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 100000
max_new 1000
log_file /var/log/ceph/ceph-osd.13.log
--- end dump of recent events ---
2012-12-05 23:10:41.216011 7f7fad109700 -1 *** Caught signal (Aborted) **
in thread 7f7fad109700
ceph version 0.55-142-g22f794d (22f794da074dd1b3221c484a5ae05b2ff1bd0fa4)
1: /usr/bin/ceph-osd() [0x797bd9]
2: (()+0xeff0) [0x7f7fc17afff0]
3: (gsignal()+0x35) [0x7f7fbfb79215]
4: (abort()+0x180) [0x7f7fbfb7c020]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f7fc040ddc5]
6: (()+0xcb166) [0x7f7fc040c166]
7: (()+0xcb193) [0x7f7fc040c193]
8: (()+0xcb28e) [0x7f7fc040c28e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x7c9) [0x7fb939]
10: (JournalingObjectStore::ApplyManager::op_apply_start(unsigned
long)+0x816) [0x747626]
11: (FileStore::_do_op(FileStore::OpSequencer*)+0x52) [0x703c22]
12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x82b) [0x82f81b]
13: (ThreadPool::WorkThread::entry()+0x10) [0x832000]
14: (()+0x68ca) [0x7f7fc17a78ca]
15: (clone()+0x6d) [0x7f7fbfc16bfd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- begin dump of recent events ---
0> 2012-12-05 23:10:41.216011 7f7fad109700 -1 *** Caught signal
(Aborted) **
in thread 7f7fad109700
ceph version 0.55-142-g22f794d (22f794da074dd1b3221c484a5ae05b2ff1bd0fa4)
1: /usr/bin/ceph-osd() [0x797bd9]
2: (()+0xeff0) [0x7f7fc17afff0]
3: (gsignal()+0x35) [0x7f7fbfb79215]
4: (abort()+0x180) [0x7f7fbfb7c020]
5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f7fc040ddc5]
6: (()+0xcb166) [0x7f7fc040c166]
7: (()+0xcb193) [0x7f7fc040c193]
8: (()+0xcb28e) [0x7f7fc040c28e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x7c9) [0x7fb939]
10: (JournalingObjectStore::ApplyManager::op_apply_start(unsigned
long)+0x816) [0x747626]
11: (FileStore::_do_op(FileStore::OpSequencer*)+0x52) [0x703c22]
12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x82b) [0x82f81b]
13: (ThreadPool::WorkThread::entry()+0x10) [0x832000]
14: (()+0x68ca) [0x7f7fc17a78ca]
15: (clone()+0x6d) [0x7f7fbfc16bfd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 0 lockdep
0/ 0 context
0/ 0 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 0 buffer
0/ 0 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 0 journaler
0/ 5 objectcacher
0/ 5 client
0/ 0 osd
0/ 0 optracker
0/ 0 objclass
0/ 0 filestore
0/ 0 journal
0/ 0 ms
1/ 5 mon
0/ 0 monc
0/ 5 paxos
0/ 0 tp
0/ 0 auth
1/ 5 crypto
0/ 0 finisher
0/ 0 heartbeatmap
0/ 0 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
0/ 0 asok
0/ 0 throttle
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 100000
max_new 1000
log_file /var/log/ceph/ceph-osd.13.log
--- end dump of recent events ---
Stefan
Am 05.12.2012 17:05, schrieb Stefan Priebe - Profihost AG:
> There was a dump in the attached log.
>
> Stefan
>
> Am 05.12.2012 um 15:41 schrieb Sage Weil <sage@inktank.com>:
>
>> On Wed, 5 Dec 2012, Stefan Priebe - Profihost AG wrote:
>>> Hello list,
>>>
>>> i updated to latest next from today and then after 20 minutes an OSD was
>>> crashing in os/JournalingObjectStore.cc.
>>>
>>> Attached is the log.
>>
>> Hmm, this is perplexing. It might just be a bad assert, but I can't see
>> how it could happen. Any chance you can reproduce with
>>
>> debug journal = 0/10
>>
>> in the [osd] section? That will give us a dump if it fails the assert.
>>
>> Thanks!
>> s
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: OSD crashed today in os/JournalingObjectStore.cc
2012-12-05 22:25 ` Stefan Priebe
@ 2012-12-05 22:29 ` Stefan Priebe
0 siblings, 0 replies; 13+ messages in thread
From: Stefan Priebe @ 2012-12-05 22:29 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel@vger.kernel.org
Hello,
this seems to happens since:
85574a3
Stefan
Am 05.12.2012 23:25, schrieb Stefan Priebe:
> Hello,
>
> i had now 8 OSDs failing again with the same error.
>
> 0> 2012-12-05 23:10:41.213149 7f7fad109700 -1
> os/JournalingObjectStore.cc: In function 'uint64_t
> JournalingObjectStore::ApplyManager::op_apply_start(uint64_t)' thread
> 7f7fad109700 time 2012-12-05 23:10:41.212454
> os/JournalingObjectStore.cc: 134: FAILED assert(op > committed_seq)
>
> ceph version 0.55-142-g22f794d (22f794da074dd1b3221c484a5ae05b2ff1bd0fa4)
> 1: (JournalingObjectStore::ApplyManager::op_apply_start(unsigned
> long)+0x816) [0x747626]
> 2: (FileStore::_do_op(FileStore::OpSequencer*)+0x52) [0x703c22]
> 3: (ThreadPool::worker(ThreadPool::WorkThread*)+0x82b) [0x82f81b]
> 4: (ThreadPool::WorkThread::entry()+0x10) [0x832000]
> 5: (()+0x68ca) [0x7f7fc17a78ca]
> 6: (clone()+0x6d) [0x7f7fbfc16bfd]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
> 0/ 5 none
> 0/ 0 lockdep
> 0/ 0 context
> 0/ 0 crush
> 1/ 5 mds
> 1/ 5 mds_balancer
> 1/ 5 mds_locker
> 1/ 5 mds_log
> 1/ 5 mds_log_expire
> 1/ 5 mds_migrator
> 0/ 0 buffer
> 0/ 0 timer
> 0/ 1 filer
> 0/ 1 striper
> 0/ 1 objecter
> 0/ 5 rados
> 0/ 5 rbd
> 0/ 0 journaler
> 0/ 5 objectcacher
> 0/ 5 client
> 0/ 0 osd
> 0/ 0 optracker
> 0/ 0 objclass
> 0/ 0 filestore
> 0/ 0 journal
> 0/ 0 ms
> 1/ 5 mon
> 0/ 0 monc
> 0/ 5 paxos
> 0/ 0 tp
> 0/ 0 auth
> 1/ 5 crypto
> 0/ 0 finisher
> 0/ 0 heartbeatmap
> 0/ 0 perfcounter
> 1/ 5 rgw
> 1/ 5 hadoop
> 1/ 5 rgw
> 1/ 5 hadoop
> 1/ 5 javaclient
> 0/ 0 asok
> 0/ 0 throttle
> -2/-2 (syslog threshold)
> -1/-1 (stderr threshold)
> max_recent 100000
> max_new 1000
> log_file /var/log/ceph/ceph-osd.13.log
> --- end dump of recent events ---
> 2012-12-05 23:10:41.216011 7f7fad109700 -1 *** Caught signal (Aborted) **
> in thread 7f7fad109700
>
> ceph version 0.55-142-g22f794d (22f794da074dd1b3221c484a5ae05b2ff1bd0fa4)
> 1: /usr/bin/ceph-osd() [0x797bd9]
> 2: (()+0xeff0) [0x7f7fc17afff0]
> 3: (gsignal()+0x35) [0x7f7fbfb79215]
> 4: (abort()+0x180) [0x7f7fbfb7c020]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f7fc040ddc5]
> 6: (()+0xcb166) [0x7f7fc040c166]
> 7: (()+0xcb193) [0x7f7fc040c193]
> 8: (()+0xcb28e) [0x7f7fc040c28e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x7c9) [0x7fb939]
> 10: (JournalingObjectStore::ApplyManager::op_apply_start(unsigned
> long)+0x816) [0x747626]
> 11: (FileStore::_do_op(FileStore::OpSequencer*)+0x52) [0x703c22]
> 12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x82b) [0x82f81b]
> 13: (ThreadPool::WorkThread::entry()+0x10) [0x832000]
> 14: (()+0x68ca) [0x7f7fc17a78ca]
> 15: (clone()+0x6d) [0x7f7fbfc16bfd]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- begin dump of recent events ---
> 0> 2012-12-05 23:10:41.216011 7f7fad109700 -1 *** Caught signal
> (Aborted) **
> in thread 7f7fad109700
>
> ceph version 0.55-142-g22f794d (22f794da074dd1b3221c484a5ae05b2ff1bd0fa4)
> 1: /usr/bin/ceph-osd() [0x797bd9]
> 2: (()+0xeff0) [0x7f7fc17afff0]
> 3: (gsignal()+0x35) [0x7f7fbfb79215]
> 4: (abort()+0x180) [0x7f7fbfb7c020]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f7fc040ddc5]
> 6: (()+0xcb166) [0x7f7fc040c166]
> 7: (()+0xcb193) [0x7f7fc040c193]
> 8: (()+0xcb28e) [0x7f7fc040c28e]
> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x7c9) [0x7fb939]
> 10: (JournalingObjectStore::ApplyManager::op_apply_start(unsigned
> long)+0x816) [0x747626]
> 11: (FileStore::_do_op(FileStore::OpSequencer*)+0x52) [0x703c22]
> 12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x82b) [0x82f81b]
> 13: (ThreadPool::WorkThread::entry()+0x10) [0x832000]
> 14: (()+0x68ca) [0x7f7fc17a78ca]
> 15: (clone()+0x6d) [0x7f7fbfc16bfd]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
> 0/ 5 none
> 0/ 0 lockdep
> 0/ 0 context
> 0/ 0 crush
> 1/ 5 mds
> 1/ 5 mds_balancer
> 1/ 5 mds_locker
> 1/ 5 mds_log
> 1/ 5 mds_log_expire
> 1/ 5 mds_migrator
> 0/ 0 buffer
> 0/ 0 timer
> 0/ 1 filer
> 0/ 1 striper
> 0/ 1 objecter
> 0/ 5 rados
> 0/ 5 rbd
> 0/ 0 journaler
> 0/ 5 objectcacher
> 0/ 5 client
> 0/ 0 osd
> 0/ 0 optracker
> 0/ 0 objclass
> 0/ 0 filestore
> 0/ 0 journal
> 0/ 0 ms
> 1/ 5 mon
> 0/ 0 monc
> 0/ 5 paxos
> 0/ 0 tp
> 0/ 0 auth
> 1/ 5 crypto
> 0/ 0 finisher
> 0/ 0 heartbeatmap
> 0/ 0 perfcounter
> 1/ 5 rgw
> 1/ 5 hadoop
> 1/ 5 javaclient
> 0/ 0 asok
> 0/ 0 throttle
> -2/-2 (syslog threshold)
> -1/-1 (stderr threshold)
> max_recent 100000
> max_new 1000
> log_file /var/log/ceph/ceph-osd.13.log
> --- end dump of recent events ---
>
> Stefan
> Am 05.12.2012 17:05, schrieb Stefan Priebe - Profihost AG:
>> There was a dump in the attached log.
>>
>> Stefan
>>
>> Am 05.12.2012 um 15:41 schrieb Sage Weil <sage@inktank.com>:
>>
>>> On Wed, 5 Dec 2012, Stefan Priebe - Profihost AG wrote:
>>>> Hello list,
>>>>
>>>> i updated to latest next from today and then after 20 minutes an OSD
>>>> was
>>>> crashing in os/JournalingObjectStore.cc.
>>>>
>>>> Attached is the log.
>>>
>>> Hmm, this is perplexing. It might just be a bad assert, but I can't see
>>> how it could happen. Any chance you can reproduce with
>>>
>>> debug journal = 0/10
>>>
>>> in the [osd] section? That will give us a dump if it fails the assert.
>>>
>>> Thanks!
>>> s
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: OSD crashed today in os/JournalingObjectStore.cc
2012-12-05 16:05 ` Stefan Priebe - Profihost AG
2012-12-05 22:25 ` Stefan Priebe
@ 2012-12-05 23:36 ` Sage Weil
2012-12-06 9:38 ` Stefan Priebe - Profihost AG
1 sibling, 1 reply; 13+ messages in thread
From: Sage Weil @ 2012-12-05 23:36 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org
On Wed, 5 Dec 2012, Stefan Priebe - Profihost AG wrote:
> There was a dump in the attached log.
The stack trace is there, but with 'debug journal = 0/20' in your conf it
will also dump all of the journal logging activity leading up to that
point. Can you reproduce with that enabled? That should tell me why op <
commited_seq.
Thanks!
sage
>
> Stefan
>
> Am 05.12.2012 um 15:41 schrieb Sage Weil <sage@inktank.com>:
>
> > On Wed, 5 Dec 2012, Stefan Priebe - Profihost AG wrote:
> >> Hello list,
> >>
> >> i updated to latest next from today and then after 20 minutes an OSD was
> >> crashing in os/JournalingObjectStore.cc.
> >>
> >> Attached is the log.
> >
> > Hmm, this is perplexing. It might just be a bad assert, but I can't see
> > how it could happen. Any chance you can reproduce with
> >
> > debug journal = 0/10
> >
> > in the [osd] section? That will give us a dump if it fails the assert.
> >
> > Thanks!
> > s
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: OSD crashed today in os/JournalingObjectStore.cc
2012-12-05 23:36 ` Sage Weil
@ 2012-12-06 9:38 ` Stefan Priebe - Profihost AG
2012-12-06 14:43 ` Sage Weil
2012-12-07 0:38 ` Sage Weil
0 siblings, 2 replies; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-12-06 9:38 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel@vger.kernel.org
Hi,
here a new dump / crash:
https://www.dropbox.com/s/1qhg0dd0fv17q10/ceph-osd.54.log.gz
Stefan
Am 06.12.2012 00:36, schrieb Sage Weil:
> On Wed, 5 Dec 2012, Stefan Priebe - Profihost AG wrote:
>> There was a dump in the attached log.
>
> The stack trace is there, but with 'debug journal = 0/20' in your conf it
> will also dump all of the journal logging activity leading up to that
> point. Can you reproduce with that enabled? That should tell me why op <
> commited_seq.
>
> Thanks!
> sage
>
>
>>
>> Stefan
>>
>> Am 05.12.2012 um 15:41 schrieb Sage Weil <sage@inktank.com>:
>>
>>> On Wed, 5 Dec 2012, Stefan Priebe - Profihost AG wrote:
>>>> Hello list,
>>>>
>>>> i updated to latest next from today and then after 20 minutes an OSD was
>>>> crashing in os/JournalingObjectStore.cc.
>>>>
>>>> Attached is the log.
>>>
>>> Hmm, this is perplexing. It might just be a bad assert, but I can't see
>>> how it could happen. Any chance you can reproduce with
>>>
>>> debug journal = 0/10
>>>
>>> in the [osd] section? That will give us a dump if it fails the assert.
>>>
>>> Thanks!
>>> s
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: OSD crashed today in os/JournalingObjectStore.cc
2012-12-06 9:38 ` Stefan Priebe - Profihost AG
@ 2012-12-06 14:43 ` Sage Weil
2012-12-06 14:47 ` Stefan Priebe - Profihost AG
2012-12-07 0:38 ` Sage Weil
1 sibling, 1 reply; 13+ messages in thread
From: Sage Weil @ 2012-12-06 14:43 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org
On Thu, 6 Dec 2012, Stefan Priebe - Profihost AG wrote:
> Hi,
>
> here a new dump / crash:
> https://www.dropbox.com/s/1qhg0dd0fv17q10/ceph-osd.54.log.gz
Awesome, thanks! I see the bug now. Working out a fix.
In the meantime, you can revert 85574a36226611ccf0fb7591fd275a2bdcca2bad
and 528108485be7912069087822e5b7a1a2f1dd515e.
sage
>
> Stefan
>
> Am 06.12.2012 00:36, schrieb Sage Weil:
> > On Wed, 5 Dec 2012, Stefan Priebe - Profihost AG wrote:
> > > There was a dump in the attached log.
> >
> > The stack trace is there, but with 'debug journal = 0/20' in your conf it
> > will also dump all of the journal logging activity leading up to that
> > point. Can you reproduce with that enabled? That should tell me why op <
> > commited_seq.
> >
> > Thanks!
> > sage
> >
> >
> > >
> > > Stefan
> > >
> > > Am 05.12.2012 um 15:41 schrieb Sage Weil <sage@inktank.com>:
> > >
> > > > On Wed, 5 Dec 2012, Stefan Priebe - Profihost AG wrote:
> > > > > Hello list,
> > > > >
> > > > > i updated to latest next from today and then after 20 minutes an OSD
> > > > > was
> > > > > crashing in os/JournalingObjectStore.cc.
> > > > >
> > > > > Attached is the log.
> > > >
> > > > Hmm, this is perplexing. It might just be a bad assert, but I can't see
> > > > how it could happen. Any chance you can reproduce with
> > > >
> > > > debug journal = 0/10
> > > >
> > > > in the [osd] section? That will give us a dump if it fails the assert.
> > > >
> > > > Thanks!
> > > > s
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: OSD crashed today in os/JournalingObjectStore.cc
2012-12-06 9:38 ` Stefan Priebe - Profihost AG
2012-12-06 14:43 ` Sage Weil
@ 2012-12-07 0:38 ` Sage Weil
2012-12-07 7:49 ` Stefan Priebe - Profihost AG
1 sibling, 1 reply; 13+ messages in thread
From: Sage Weil @ 2012-12-07 0:38 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org
Hi Stefan,
I've pushed a few patches to wip-filestore2 that simplify and fix this
code. Can you give it a go?
Thanks!
sage
On Thu, 6 Dec 2012, Stefan Priebe - Profihost AG wrote:
> Hi,
>
> here a new dump / crash:
> https://www.dropbox.com/s/1qhg0dd0fv17q10/ceph-osd.54.log.gz
>
> Stefan
>
> Am 06.12.2012 00:36, schrieb Sage Weil:
> > On Wed, 5 Dec 2012, Stefan Priebe - Profihost AG wrote:
> > > There was a dump in the attached log.
> >
> > The stack trace is there, but with 'debug journal = 0/20' in your conf it
> > will also dump all of the journal logging activity leading up to that
> > point. Can you reproduce with that enabled? That should tell me why op <
> > commited_seq.
> >
> > Thanks!
> > sage
> >
> >
> > >
> > > Stefan
> > >
> > > Am 05.12.2012 um 15:41 schrieb Sage Weil <sage@inktank.com>:
> > >
> > > > On Wed, 5 Dec 2012, Stefan Priebe - Profihost AG wrote:
> > > > > Hello list,
> > > > >
> > > > > i updated to latest next from today and then after 20 minutes an OSD
> > > > > was
> > > > > crashing in os/JournalingObjectStore.cc.
> > > > >
> > > > > Attached is the log.
> > > >
> > > > Hmm, this is perplexing. It might just be a bad assert, but I can't see
> > > > how it could happen. Any chance you can reproduce with
> > > >
> > > > debug journal = 0/10
> > > >
> > > > in the [osd] section? That will give us a dump if it fails the assert.
> > > >
> > > > Thanks!
> > > > s
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: OSD crashed today in os/JournalingObjectStore.cc
2012-12-07 0:38 ` Sage Weil
@ 2012-12-07 7:49 ` Stefan Priebe - Profihost AG
2012-12-07 11:02 ` Sage Weil
0 siblings, 1 reply; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-12-07 7:49 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel@vger.kernel.org
Hi Sage,
thanks for patching this. I'm not sure whether i can test this. I've
moved my systems to first productional tests and i can't create a
downtime or loss of data again ;-)
How stable are these fixes?
Am 07.12.2012 01:38, schrieb Sage Weil:
> Hi Stefan,
>
> I've pushed a few patches to wip-filestore2 that simplify and fix this
> code. Can you give it a go?
>
> Thanks!
> sage
Greets,
Stefan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: OSD crashed today in os/JournalingObjectStore.cc
2012-12-07 7:49 ` Stefan Priebe - Profihost AG
@ 2012-12-07 11:02 ` Sage Weil
2012-12-07 11:29 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 13+ messages in thread
From: Sage Weil @ 2012-12-07 11:02 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG; +Cc: ceph-devel@vger.kernel.org
On Fri, 7 Dec 2012, Stefan Priebe - Profihost AG wrote:
> Hi Sage,
>
> thanks for patching this. I'm not sure whether i can test this. I've moved my
> systems to first productional tests and i can't create a downtime or loss of
> data again ;-)
>
> How stable are these fixes?
Only tested on my laptop. :)
I'll put them through our qa first, and make sure our stress tests
can trigger the old failure.
Thanks!
sage
>
> Am 07.12.2012 01:38, schrieb Sage Weil:
> > Hi Stefan,
> >
> > I've pushed a few patches to wip-filestore2 that simplify and fix this
> > code. Can you give it a go?
> >
> > Thanks!
> > sage
>
> Greets,
> Stefan
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: OSD crashed today in os/JournalingObjectStore.cc
2012-12-07 11:02 ` Sage Weil
@ 2012-12-07 11:29 ` Stefan Priebe - Profihost AG
0 siblings, 0 replies; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-12-07 11:29 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel@vger.kernel.org
Hello Sage,
that would be great. Can you comeback then to me?
Greets
Stefan
Am 07.12.2012 12:02, schrieb Sage Weil:
> On Fri, 7 Dec 2012, Stefan Priebe - Profihost AG wrote:
>> Hi Sage,
>>
>> thanks for patching this. I'm not sure whether i can test this. I've moved my
>> systems to first productional tests and i can't create a downtime or loss of
>> data again ;-)
>>
>> How stable are these fixes?
>
> Only tested on my laptop. :)
>
> I'll put them through our qa first, and make sure our stress tests
> can trigger the old failure.
>
> Thanks!
> sage
>
>
>>
>> Am 07.12.2012 01:38, schrieb Sage Weil:
>>> Hi Stefan,
>>>
>>> I've pushed a few patches to wip-filestore2 that simplify and fix this
>>> code. Can you give it a go?
>>>
>>> Thanks!
>>> sage
>>
>> Greets,
>> Stefan
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-12-07 11:30 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-05 9:56 OSD crashed today in os/JournalingObjectStore.cc Stefan Priebe - Profihost AG
2012-12-05 14:41 ` Sage Weil
2012-12-05 16:05 ` Stefan Priebe - Profihost AG
2012-12-05 22:25 ` Stefan Priebe
2012-12-05 22:29 ` Stefan Priebe
2012-12-05 23:36 ` Sage Weil
2012-12-06 9:38 ` Stefan Priebe - Profihost AG
2012-12-06 14:43 ` Sage Weil
2012-12-06 14:47 ` Stefan Priebe - Profihost AG
2012-12-07 0:38 ` Sage Weil
2012-12-07 7:49 ` Stefan Priebe - Profihost AG
2012-12-07 11:02 ` Sage Weil
2012-12-07 11:29 ` Stefan Priebe - Profihost AG
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.