osd crash during resync

All of lore.kernel.org
 help / color / mirror / Atom feed

* osd crash during resync
@ 2012-01-24 18:48 Martin Mailand
  2012-01-24 21:13 ` Gregory Farnum
  2012-01-25 22:08 ` Sage Weil
  0 siblings, 2 replies; 6+ messages in thread
From: Martin Mailand @ 2012-01-24 18:48 UTC (permalink / raw)
  To: ceph-devel

Hi,
today I tried the btrfs patch mentioned on the btrfs ml. Therefore I 
rebooted osd.0 with a new kernel and created a new btrfs on the osd.0, 
than I took the osd.0 into the cluster. During the the resync of osd.0 
osd.2 and osd.3 crashed.
I am not sure, if the crashes happened because I played with osd.0, or 
if they are bugs.


osd.2
-rw-------  1 root root 1.1G 2012-01-24 12:19 
core-ceph-osd-1000-1327403927-s-brick-002

log:
2012-01-24 12:15:45.563135 7f1fdd42c700 log [INF] : 2.a restarting 
backfill on osd.0 from (185'113859,185'113859] 0//0 to 196'114038
osd/PG.cc: In function 'void PG::finish_recovery_op(const hobject_t&, 
bool)', in thread '7f1fdab26700'
osd/PG.cc: 1553: FAILED assert(recovery_ops_active > 0)

-rw-------  1 root root 758M 2012-01-24 15:58 
core-ceph-osd-20755-1327417128-s-brick-002

log:
2012-01-24 15:58:48.356892 7fe26acbf700 osd.2 379 pg[2.ff( v 379'286211 
lc 202'286160 (185'285159,379'286211] n=112 ec=1 les/c 379/310 
373/376/376) [2,1] r=0 lpr=376 rops=1 mlcod 202'286160 active m=6]  * 
oi->watcher: client.4478 cookie=1
osd/ReplicatedPG.cc: In function 'void 
ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in 
thread '7fe26fdca700'
osd/ReplicatedPG.cc: 3199: FAILED assert(obc->watchers.size() == 0)
osd/ReplicatedPG.cc: In function 'void 
ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in 
thread '7fe26fdca700'

http://85.214.49.87/ceph/20120124/osd.2.log.bz2



osd.3
-rw-------  1 root root 986M 2012-01-24 12:24 
core-ceph-osd-962-1327404263-s-brick-003

log:
2012-01-24 12:15:50.241321 7f30c8fde700 log [INF] : 2.2e restarting 
backfill on osd.0 from (185'338312,185'338312] 0//0 to 196'339910
2012-01-24 12:21:48.420242 7f30c5ed7700 log [INF] : 2.9d scrub ok
osd/PG.cc: In function 'void PG::activate(ObjectStore::Transaction&, 
std::list<Context*>&, std::map<int, std::map<pg_t, PG::Query> >&, 
std::map<int, MOSDPGInfo*>*)', in thread '7f30c8fde700'

http://85.214.49.87/ceph/20120124/osd.3.log.bz2



-martin



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: osd crash during resync
  2012-01-24 18:48 osd crash during resync Martin Mailand
@ 2012-01-24 21:13 ` Gregory Farnum
  2012-01-24 21:22   ` Martin Mailand
  2012-01-25 22:08 ` Sage Weil
  1 sibling, 1 reply; 6+ messages in thread
From: Gregory Farnum @ 2012-01-24 21:13 UTC (permalink / raw)
  To: martin; +Cc: ceph-devel

On Tue, Jan 24, 2012 at 10:48 AM, Martin Mailand <martin@tuxadero.com> wrote:
> Hi,
> today I tried the btrfs patch mentioned on the btrfs ml. Therefore I
> rebooted osd.0 with a new kernel and created a new btrfs on the osd.0, than
> I took the osd.0 into the cluster. During the the resync of osd.0 osd.2 and
> osd.3 crashed.
> I am not sure, if the crashes happened because I played with osd.0, or if
> they are bugs.

These are OSD-level issues not caused by btrfs, so your new kernel
definitely didn't do it. It's probably fallout from the backfill
changes that got merged in last week. I created new bugs to track
them: http://tracker.newdream.net/issues/1982 (1983, 1984). Sam and
Josh are going wild on some other issues that we've turned up and
these have been added to the queue as soon as somebody qualified can
get to them. :)
-Greg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: osd crash during resync
  2012-01-24 21:13 ` Gregory Farnum
@ 2012-01-24 21:22   ` Martin Mailand
  2012-01-24 21:25     ` Gregory Farnum
  0 siblings, 1 reply; 6+ messages in thread
From: Martin Mailand @ 2012-01-24 21:22 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel

Hi Greg,
ok, do you guys still need the core files, or could I delete them?

-martin

Am 24.01.2012 22:13, schrieb Gregory Farnum:
> On Tue, Jan 24, 2012 at 10:48 AM, Martin Mailand<martin@tuxadero.com>  wrote:
>> Hi,
>> today I tried the btrfs patch mentioned on the btrfs ml. Therefore I
>> rebooted osd.0 with a new kernel and created a new btrfs on the osd.0, than
>> I took the osd.0 into the cluster. During the the resync of osd.0 osd.2 and
>> osd.3 crashed.
>> I am not sure, if the crashes happened because I played with osd.0, or if
>> they are bugs.
>
> These are OSD-level issues not caused by btrfs, so your new kernel
> definitely didn't do it. It's probably fallout from the backfill
> changes that got merged in last week. I created new bugs to track
> them: http://tracker.newdream.net/issues/1982 (1983, 1984). Sam and
> Josh are going wild on some other issues that we've turned up and
> these have been added to the queue as soon as somebody qualified can
> get to them. :)
> -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: osd crash during resync
  2012-01-24 21:22   ` Martin Mailand
@ 2012-01-24 21:25     ` Gregory Farnum
  0 siblings, 0 replies; 6+ messages in thread
From: Gregory Farnum @ 2012-01-24 21:25 UTC (permalink / raw)
  To: martin; +Cc: ceph-devel

On Tue, Jan 24, 2012 at 1:22 PM, Martin Mailand <martin@tuxadero.com> wrote:
> Hi Greg,
> ok, do you guys still need the core files, or could I delete them?

Sam thinks probably not since we have the backtraces and the
logs...thanks for asking, though! :)
-Greg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: osd crash during resync
  2012-01-24 18:48 osd crash during resync Martin Mailand
  2012-01-24 21:13 ` Gregory Farnum
@ 2012-01-25 22:08 ` Sage Weil
  2012-01-26 10:18   ` Martin Mailand
  1 sibling, 1 reply; 6+ messages in thread
From: Sage Weil @ 2012-01-25 22:08 UTC (permalink / raw)
  To: Martin Mailand; +Cc: ceph-devel

Hi Martin,

On Tue, 24 Jan 2012, Martin Mailand wrote:
> Hi,
> today I tried the btrfs patch mentioned on the btrfs ml. Therefore I rebooted
> osd.0 with a new kernel and created a new btrfs on the osd.0, than I took the
> osd.0 into the cluster. During the the resync of osd.0 osd.2 and osd.3
> crashed.
> I am not sure, if the crashes happened because I played with osd.0, or if they
> are bugs.
> 
> 
> osd.2
> -rw-------  1 root root 1.1G 2012-01-24 12:19
> core-ceph-osd-1000-1327403927-s-brick-002
> 
> log:
> 2012-01-24 12:15:45.563135 7f1fdd42c700 log [INF] : 2.a restarting backfill on
> osd.0 from (185'113859,185'113859] 0//0 to 196'114038
> osd/PG.cc: In function 'void PG::finish_recovery_op(const hobject_t&, bool)',
> in thread '7f1fdab26700'
> osd/PG.cc: 1553: FAILED assert(recovery_ops_active > 0)
> 
> -rw-------  1 root root 758M 2012-01-24 15:58
> core-ceph-osd-20755-1327417128-s-brick-002

Can you post the log for osd.0 too?

Thanks!
sage



> 
> log:
> 2012-01-24 15:58:48.356892 7fe26acbf700 osd.2 379 pg[2.ff( v 379'286211 lc
> 202'286160 (185'285159,379'286211] n=112 ec=1 les/c 379/310 373/376/376) [2,1]
> r=0 lpr=376 rops=1 mlcod 202'286160 active m=6]  * oi->watcher: client.4478
> cookie=1
> osd/ReplicatedPG.cc: In function 'void
> ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
> '7fe26fdca700'
> osd/ReplicatedPG.cc: 3199: FAILED assert(obc->watchers.size() == 0)
> osd/ReplicatedPG.cc: In function 'void
> ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
> '7fe26fdca700'
> 
> http://85.214.49.87/ceph/20120124/osd.2.log.bz2
> 
> 
> 
> osd.3
> -rw-------  1 root root 986M 2012-01-24 12:24
> core-ceph-osd-962-1327404263-s-brick-003
> 
> log:
> 2012-01-24 12:15:50.241321 7f30c8fde700 log [INF] : 2.2e restarting backfill
> on osd.0 from (185'338312,185'338312] 0//0 to 196'339910
> 2012-01-24 12:21:48.420242 7f30c5ed7700 log [INF] : 2.9d scrub ok
> osd/PG.cc: In function 'void PG::activate(ObjectStore::Transaction&,
> std::list<Context*>&, std::map<int, std::map<pg_t, PG::Query> >&,
> std::map<int, MOSDPGInfo*>*)', in thread '7f30c8fde700'
> 
> http://85.214.49.87/ceph/20120124/osd.3.log.bz2
> 
> 
> 
> -martin
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: osd crash during resync
  2012-01-25 22:08 ` Sage Weil
@ 2012-01-26 10:18   ` Martin Mailand
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Mailand @ 2012-01-26 10:18 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

Hi Sage,
I uploaded the osd.0 log as well.

http://85.214.49.87/ceph/20120124/osd.0.log.bz2

-martin

Am 25.01.2012 23:08, schrieb Sage Weil:
> Hi Martin,
>
> On Tue, 24 Jan 2012, Martin Mailand wrote:
>> Hi,
>> today I tried the btrfs patch mentioned on the btrfs ml. Therefore I rebooted
>> osd.0 with a new kernel and created a new btrfs on the osd.0, than I took the
>> osd.0 into the cluster. During the the resync of osd.0 osd.2 and osd.3
>> crashed.
>> I am not sure, if the crashes happened because I played with osd.0, or if they
>> are bugs.
>>
>>
>> osd.2
>> -rw-------  1 root root 1.1G 2012-01-24 12:19
>> core-ceph-osd-1000-1327403927-s-brick-002
>>
>> log:
>> 2012-01-24 12:15:45.563135 7f1fdd42c700 log [INF] : 2.a restarting backfill on
>> osd.0 from (185'113859,185'113859] 0//0 to 196'114038
>> osd/PG.cc: In function 'void PG::finish_recovery_op(const hobject_t&, bool)',
>> in thread '7f1fdab26700'
>> osd/PG.cc: 1553: FAILED assert(recovery_ops_active>  0)
>>
>> -rw-------  1 root root 758M 2012-01-24 15:58
>> core-ceph-osd-20755-1327417128-s-brick-002
>
> Can you post the log for osd.0 too?
>
> Thanks!
> sage
>
>
>
>>
>> log:
>> 2012-01-24 15:58:48.356892 7fe26acbf700 osd.2 379 pg[2.ff( v 379'286211 lc
>> 202'286160 (185'285159,379'286211] n=112 ec=1 les/c 379/310 373/376/376) [2,1]
>> r=0 lpr=376 rops=1 mlcod 202'286160 active m=6]  * oi->watcher: client.4478
>> cookie=1
>> osd/ReplicatedPG.cc: In function 'void
>> ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
>> '7fe26fdca700'
>> osd/ReplicatedPG.cc: 3199: FAILED assert(obc->watchers.size() == 0)
>> osd/ReplicatedPG.cc: In function 'void
>> ReplicatedPG::populate_obc_watchers(ReplicatedPG::ObjectContext*)', in thread
>> '7fe26fdca700'
>>
>> http://85.214.49.87/ceph/20120124/osd.2.log.bz2
>>
>>
>>
>> osd.3
>> -rw-------  1 root root 986M 2012-01-24 12:24
>> core-ceph-osd-962-1327404263-s-brick-003
>>
>> log:
>> 2012-01-24 12:15:50.241321 7f30c8fde700 log [INF] : 2.2e restarting backfill
>> on osd.0 from (185'338312,185'338312] 0//0 to 196'339910
>> 2012-01-24 12:21:48.420242 7f30c5ed7700 log [INF] : 2.9d scrub ok
>> osd/PG.cc: In function 'void PG::activate(ObjectStore::Transaction&,
>> std::list<Context*>&, std::map<int, std::map<pg_t, PG::Query>  >&,
>> std::map<int, MOSDPGInfo*>*)', in thread '7f30c8fde700'
>>
>> http://85.214.49.87/ceph/20120124/osd.3.log.bz2
>>
>>
>>
>> -martin
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-01-26 10:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-24 18:48 osd crash during resync Martin Mailand
2012-01-24 21:13 ` Gregory Farnum
2012-01-24 21:22   ` Martin Mailand
2012-01-24 21:25     ` Gregory Farnum
2012-01-25 22:08 ` Sage Weil
2012-01-26 10:18   ` Martin Mailand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.