* osd crash during resync
@ 2012-01-24 18:48 Martin Mailand
2012-01-24 21:13 ` Gregory Farnum
2012-01-25 22:08 ` Sage Weil
0 siblings, 2 replies; 6+ messages in thread
From: Martin Mailand @ 2012-01-24 18:48 UTC (permalink / raw)
To: ceph-devel
Hi,
today I tried the btrfs patch mentioned on the btrfs ml. Therefore I
rebooted osd.0 with a new kernel and created a new btrfs on the osd.0,
than I took the osd.0 into the cluster. During the the resync of osd.0
osd.2 and osd.3 crashed.
I am not sure, if the crashes happened because I played with osd.0, or
if they are bugs.
osd.2
-rw------- 1 root root 1.1G 2012-01-24 12:19
core-ceph-osd-1000-1327403927-s-brick-002
log:
2012-01-24 12:15:45.563135 7f1fdd42c700 log [INF] : 2.a restarting
backfill on osd.0 from (185'113859,185'113859] 0//0 to 196'114038
osd/PG.cc: In function 'void PG::finish_recovery_op(const hobject_t&,
bool)', in thread '7f1fdab26700'
osd/PG.cc: 1553: FAILED assert(recovery_ops_active > 0)
-rw------- 1 root root 758M 2012-01-24 15:58
core-ceph-osd-20755-1327417128-s-brick-002
log:
2012-01-24 15:58:48.356892 7fe26acbf700 osd.2 379 pg[2.ff( v 379'286211
lc 202'286160 (185'285159,379'286211] n=112 ec=1 les/c 379/310
373/376/376) [2,1] r=0 lpr=376 rops=1 mlcod 202'286160 active m=6] *
oi->watcher: client.4478 cookie=1
osd/ReplicatedPG.cc: In function 'void
ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in
thread '7fe26fdca700'
osd/ReplicatedPG.cc: 3199: FAILED assert(obc->watchers.size() == 0)
osd/ReplicatedPG.cc: In function 'void
ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in
thread '7fe26fdca700'
http://85.214.49.87/ceph/20120124/osd.2.log.bz2
osd.3
-rw------- 1 root root 986M 2012-01-24 12:24
core-ceph-osd-962-1327404263-s-brick-003
log:
2012-01-24 12:15:50.241321 7f30c8fde700 log [INF] : 2.2e restarting
backfill on osd.0 from (185'338312,185'338312] 0//0 to 196'339910
2012-01-24 12:21:48.420242 7f30c5ed7700 log [INF] : 2.9d scrub ok
osd/PG.cc: In function 'void PG::activate(ObjectStore::Transaction&,
std::list<Context*>&, std::map<int, std::map<pg_t, PG::Query> >&,
std::map<int, MOSDPGInfo*>*)', in thread '7f30c8fde700'
http://85.214.49.87/ceph/20120124/osd.3.log.bz2
-martin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: osd crash during resync
2012-01-24 18:48 osd crash during resync Martin Mailand
@ 2012-01-24 21:13 ` Gregory Farnum
2012-01-24 21:22 ` Martin Mailand
2012-01-25 22:08 ` Sage Weil
1 sibling, 1 reply; 6+ messages in thread
From: Gregory Farnum @ 2012-01-24 21:13 UTC (permalink / raw)
To: martin; +Cc: ceph-devel
On Tue, Jan 24, 2012 at 10:48 AM, Martin Mailand <martin@tuxadero.com> wrote:
> Hi,
> today I tried the btrfs patch mentioned on the btrfs ml. Therefore I
> rebooted osd.0 with a new kernel and created a new btrfs on the osd.0, than
> I took the osd.0 into the cluster. During the the resync of osd.0 osd.2 and
> osd.3 crashed.
> I am not sure, if the crashes happened because I played with osd.0, or if
> they are bugs.
These are OSD-level issues not caused by btrfs, so your new kernel
definitely didn't do it. It's probably fallout from the backfill
changes that got merged in last week. I created new bugs to track
them: http://tracker.newdream.net/issues/1982 (1983, 1984). Sam and
Josh are going wild on some other issues that we've turned up and
these have been added to the queue as soon as somebody qualified can
get to them. :)
-Greg
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: osd crash during resync
2012-01-24 21:13 ` Gregory Farnum
@ 2012-01-24 21:22 ` Martin Mailand
2012-01-24 21:25 ` Gregory Farnum
0 siblings, 1 reply; 6+ messages in thread
From: Martin Mailand @ 2012-01-24 21:22 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel
Hi Greg,
ok, do you guys still need the core files, or could I delete them?
-martin
Am 24.01.2012 22:13, schrieb Gregory Farnum:
> On Tue, Jan 24, 2012 at 10:48 AM, Martin Mailand<martin@tuxadero.com> wrote:
>> Hi,
>> today I tried the btrfs patch mentioned on the btrfs ml. Therefore I
>> rebooted osd.0 with a new kernel and created a new btrfs on the osd.0, than
>> I took the osd.0 into the cluster. During the the resync of osd.0 osd.2 and
>> osd.3 crashed.
>> I am not sure, if the crashes happened because I played with osd.0, or if
>> they are bugs.
>
> These are OSD-level issues not caused by btrfs, so your new kernel
> definitely didn't do it. It's probably fallout from the backfill
> changes that got merged in last week. I created new bugs to track
> them: http://tracker.newdream.net/issues/1982 (1983, 1984). Sam and
> Josh are going wild on some other issues that we've turned up and
> these have been added to the queue as soon as somebody qualified can
> get to them. :)
> -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: osd crash during resync
2012-01-24 18:48 osd crash during resync Martin Mailand
2012-01-24 21:13 ` Gregory Farnum
@ 2012-01-25 22:08 ` Sage Weil
2012-01-26 10:18 ` Martin Mailand
1 sibling, 1 reply; 6+ messages in thread
From: Sage Weil @ 2012-01-25 22:08 UTC (permalink / raw)
To: Martin Mailand; +Cc: ceph-devel
Hi Martin,
On Tue, 24 Jan 2012, Martin Mailand wrote:
> Hi,
> today I tried the btrfs patch mentioned on the btrfs ml. Therefore I rebooted
> osd.0 with a new kernel and created a new btrfs on the osd.0, than I took the
> osd.0 into the cluster. During the the resync of osd.0 osd.2 and osd.3
> crashed.
> I am not sure, if the crashes happened because I played with osd.0, or if they
> are bugs.
>
>
> osd.2
> -rw------- 1 root root 1.1G 2012-01-24 12:19
> core-ceph-osd-1000-1327403927-s-brick-002
>
> log:
> 2012-01-24 12:15:45.563135 7f1fdd42c700 log [INF] : 2.a restarting backfill on
> osd.0 from (185'113859,185'113859] 0//0 to 196'114038
> osd/PG.cc: In function 'void PG::finish_recovery_op(const hobject_t&, bool)',
> in thread '7f1fdab26700'
> osd/PG.cc: 1553: FAILED assert(recovery_ops_active > 0)
>
> -rw------- 1 root root 758M 2012-01-24 15:58
> core-ceph-osd-20755-1327417128-s-brick-002
Can you post the log for osd.0 too?
Thanks!
sage
>
> log:
> 2012-01-24 15:58:48.356892 7fe26acbf700 osd.2 379 pg[2.ff( v 379'286211 lc
> 202'286160 (185'285159,379'286211] n=112 ec=1 les/c 379/310 373/376/376) [2,1]
> r=0 lpr=376 rops=1 mlcod 202'286160 active m=6] * oi->watcher: client.4478
> cookie=1
> osd/ReplicatedPG.cc: In function 'void
> ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
> '7fe26fdca700'
> osd/ReplicatedPG.cc: 3199: FAILED assert(obc->watchers.size() == 0)
> osd/ReplicatedPG.cc: In function 'void
> ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
> '7fe26fdca700'
>
> http://85.214.49.87/ceph/20120124/osd.2.log.bz2
>
>
>
> osd.3
> -rw------- 1 root root 986M 2012-01-24 12:24
> core-ceph-osd-962-1327404263-s-brick-003
>
> log:
> 2012-01-24 12:15:50.241321 7f30c8fde700 log [INF] : 2.2e restarting backfill
> on osd.0 from (185'338312,185'338312] 0//0 to 196'339910
> 2012-01-24 12:21:48.420242 7f30c5ed7700 log [INF] : 2.9d scrub ok
> osd/PG.cc: In function 'void PG::activate(ObjectStore::Transaction&,
> std::list<Context*>&, std::map<int, std::map<pg_t, PG::Query> >&,
> std::map<int, MOSDPGInfo*>*)', in thread '7f30c8fde700'
>
> http://85.214.49.87/ceph/20120124/osd.3.log.bz2
>
>
>
> -martin
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: osd crash during resync
2012-01-25 22:08 ` Sage Weil
@ 2012-01-26 10:18 ` Martin Mailand
0 siblings, 0 replies; 6+ messages in thread
From: Martin Mailand @ 2012-01-26 10:18 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
Hi Sage,
I uploaded the osd.0 log as well.
http://85.214.49.87/ceph/20120124/osd.0.log.bz2
-martin
Am 25.01.2012 23:08, schrieb Sage Weil:
> Hi Martin,
>
> On Tue, 24 Jan 2012, Martin Mailand wrote:
>> Hi,
>> today I tried the btrfs patch mentioned on the btrfs ml. Therefore I rebooted
>> osd.0 with a new kernel and created a new btrfs on the osd.0, than I took the
>> osd.0 into the cluster. During the the resync of osd.0 osd.2 and osd.3
>> crashed.
>> I am not sure, if the crashes happened because I played with osd.0, or if they
>> are bugs.
>>
>>
>> osd.2
>> -rw------- 1 root root 1.1G 2012-01-24 12:19
>> core-ceph-osd-1000-1327403927-s-brick-002
>>
>> log:
>> 2012-01-24 12:15:45.563135 7f1fdd42c700 log [INF] : 2.a restarting backfill on
>> osd.0 from (185'113859,185'113859] 0//0 to 196'114038
>> osd/PG.cc: In function 'void PG::finish_recovery_op(const hobject_t&, bool)',
>> in thread '7f1fdab26700'
>> osd/PG.cc: 1553: FAILED assert(recovery_ops_active> 0)
>>
>> -rw------- 1 root root 758M 2012-01-24 15:58
>> core-ceph-osd-20755-1327417128-s-brick-002
>
> Can you post the log for osd.0 too?
>
> Thanks!
> sage
>
>
>
>>
>> log:
>> 2012-01-24 15:58:48.356892 7fe26acbf700 osd.2 379 pg[2.ff( v 379'286211 lc
>> 202'286160 (185'285159,379'286211] n=112 ec=1 les/c 379/310 373/376/376) [2,1]
>> r=0 lpr=376 rops=1 mlcod 202'286160 active m=6] * oi->watcher: client.4478
>> cookie=1
>> osd/ReplicatedPG.cc: In function 'void
>> ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
>> '7fe26fdca700'
>> osd/ReplicatedPG.cc: 3199: FAILED assert(obc->watchers.size() == 0)
>> osd/ReplicatedPG.cc: In function 'void
>> ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
>> '7fe26fdca700'
>>
>> http://85.214.49.87/ceph/20120124/osd.2.log.bz2
>>
>>
>>
>> osd.3
>> -rw------- 1 root root 986M 2012-01-24 12:24
>> core-ceph-osd-962-1327404263-s-brick-003
>>
>> log:
>> 2012-01-24 12:15:50.241321 7f30c8fde700 log [INF] : 2.2e restarting backfill
>> on osd.0 from (185'338312,185'338312] 0//0 to 196'339910
>> 2012-01-24 12:21:48.420242 7f30c5ed7700 log [INF] : 2.9d scrub ok
>> osd/PG.cc: In function 'void PG::activate(ObjectStore::Transaction&,
>> std::list<Context*>&, std::map<int, std::map<pg_t, PG::Query> >&,
>> std::map<int, MOSDPGInfo*>*)', in thread '7f30c8fde700'
>>
>> http://85.214.49.87/ceph/20120124/osd.3.log.bz2
>>
>>
>>
>> -martin
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-01-26 10:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-24 18:48 osd crash during resync Martin Mailand
2012-01-24 21:13 ` Gregory Farnum
2012-01-24 21:22 ` Martin Mailand
2012-01-24 21:25 ` Gregory Farnum
2012-01-25 22:08 ` Sage Weil
2012-01-26 10:18 ` Martin Mailand
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.