Re: Problems after crash yesterday

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Gregory Farnum <gregory.farnum@dreamhost.com>
To: "Jens Rehpöhler" <jens.rehpoehler@filoo.de>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
	"sage@newdream.net" <sage@newdream.net>
Subject: Re: Problems after crash yesterday
Date: Wed, 22 Feb 2012 09:12:04 -0800	[thread overview]
Message-ID: <406144037926269578@unknownmsgid> (raw)
In-Reply-To: <4F44BB25.10202@filoo.de>

On Feb 22, 2012, at 1:53 AM, "Jens Rehpöhler" <jens.rehpoehler@filoo.de> wrote:

> Some Additios: meanwhile we are at the state:
>
> 2012-02-22 10:38:49.587403    pg v1044553: 2046 pgs: 2036 active+clean,
> 10 active+clean+inconsistent; 2110 GB data, 4061 GB used, 25732 GB /
> 29794 GB avail
>
> The  active+recovering+remapped+backfill disappeared auf a restart of a
> cashed OSD.
>
> The OSD crashed after issuing the command "ceph pg repair 106.3".
>
> The repeating message is also there:
Hmm. These messages indicate there are requests that came in that
never got answered -- or else that the tracking code isn't quite right
(it's new functionality). What version are you running?

> 2012-02-22 10:52:36.198983   log 2012-02-22 10:52:32.182488 osd.3
> 10.10.10.8:6803/29916 302906 : [WRN] old request pg_log(0.ea epoch 849
> query_epoch 843) v2 received at 2012-02-20 17:39:41.774507 currently started
> 2012-02-22 10:52:36.198983   log 2012-02-22 10:52:32.182500 osd.3
> 10.10.10.8:6803/29916 302907 : [WRN] old request pg_log(2.e8 epoch 849
> query_epoch 843) v2 received at 2012-02-20 17:39:41.774662 currently no
> flag points reached
> 2012-02-22 10:52:36.198983   log 2012-02-22 10:52:33.182615 osd.3
> 10.10.10.8:6803/29916 302908 : [WRN] old request pg_log(0.ea epoch 849
> query_epoch 843) v2 received at 2012-02-20 17:39:41.774507 currently started
> 2012-02-22 10:52:36.198983   log 2012-02-22 10:52:33.182629 osd.3
> 10.10.10.8:6803/29916 302909 : [WRN] old request pg_log(2.e8 epoch 849
> query_epoch 843) v2 received at 2012-02-20 17:39:41.774662 currently no
> flag points reached
> 2012-02-22 10:52:36.198983   log 2012-02-22 10:52:34.182839 osd.3
> 10.10.10.8:6803/29916 302910 : [WRN] old request pg_log(0.ea epoch 849
> query_epoch 843) v2 received at 2012-02-20 17:39:41.774507 currently started
> 2012-02-22 10:52:36.198983   log 2012-02-22 10:52:34.182853 osd.3
> 10.10.10.8:6803/29916 302911 : [WRN] old request pg_log(2.e8 epoch 849
> query_epoch 843) v2 received at 2012-02-20 17:39:41.774662 currently no
> flag points reached
> 2012-02-22 10:52:36.198983   log 2012-02-22 10:52:35.183075 osd.3
> 10.10.10.8:6803/29916 302912 : [WRN] old request pg_log(0.ea epoch 849
> query_epoch 843) v2 received at 2012-02-20 17:39:41.774507 currently started
> 2012-02-22 10:52:36.198983   log 2012-02-22 10:52:35.183089 osd.3
> 10.10.10.8:6803/29916 302913 : [WRN] old request pg_log(2.e8 epoch 849
> query_epoch 843) v2 received at 2012-02-20 17:39:41.774662 currently no
> flag points reached
>
> Seems to hang since our crash.
>
> At last we see some scrub error like this:
>
> 2012-02-22 10:47:35.049386 log 2012-02-22 10:47:25.310571 osd.4
> 10.10.10.10:6800/17745 34356 : [ERR] 16.4 osd.2: soid
> ce7f1004/rb.0.0.00000000001a/headmissing attr _, missing attr
And that's a problem with the xattrs. What filesystem are you using
underneath Ceph?

>
> any advice ?
>
> thanks
>
> Jens
>
>
>
> Am 21.02.2012 11:24, schrieb Jens Rehpöhler:
>> Hi sage,
>>
>> sorry ... we have to disturb you again.
>>
>> After the node crash (oli wrote about that) we have some problems.
>>
>> The recovery process is stuck at:
>>
>> 2012-02-21 11:20:15.948527    pg v986715: 2046 pgs: 2035 active+clean,
>> 10 active+clean+inconsistent, 1 active+recovering+remapped+backfill;
>> 1988 GB data, 3823 GB used, 25970 GB / 29794 GB avail; 1/1121879
>> degraded (0.000%)
>>
>> We also see this messages every few seconds:
>>
>> 2012-02-21 11:20:15.106958   log 2012-02-21 11:20:05.765762 osd.3
>> 10.10.10.8:6803/29916 131581 : [WRN] old request pg_log(0.ea epoch 849
>> query_epoch 843) v2 received at 2012-02-20 17:39:41.774507 currently started
>> 2012-02-21 11:20:15.106958   log 2012-02-21 11:20:05.765775 osd.3
>> 10.10.10.8:6803/29916 131582 : [WRN] old request pg_log(2.e8 epoch 849
>> query_epoch 843) v2 received at 2012-02-20 17:39:41.774662 currently no
>> flag points reached
>> 2012-02-21 11:20:15.106958   log 2012-02-21 11:20:06.765912 osd.3
>> 10.10.10.8:6803/29916 131583 : [WRN] old request pg_log(0.ea epoch 849
>> query_epoch 843) v2 received at 2012-02-20 17:39:41.774507 currently started
>> 2012-02-21 11:20:15.106958   log 2012-02-21 11:20:06.765943 osd.3
>> 10.10.10.8:6803/29916 131584 : [WRN] old request pg_log(2.e8 epoch 849
>> query_epoch 843) v2 received at 2012-02-20 17:39:41.774662 currently no
>> flag points reached
>> 2012-02-21 11:20:15.106958   log 2012-02-21 11:20:07.766312 osd.3
>> 10.10.10.8:6803/29916 131585 : [WRN] old request pg_log(0.ea epoch 849
>> query_epoch 843) v2 received at 2012-02-20 17:39:41.774507 currently started
>> 2012-02-21 11:20:15.106958   log 2012-02-21 11:20:07.766324 osd.3
>> 10.10.10.8:6803/29916 131586 : [WRN] old request pg_log(2.e8 epoch 849
>> query_epoch 843) v2 received at 2012-02-20 17:39:41.774662 currently no
>> flag points reached
>> 2012-02-21 11:20:15.106958   log 2012-02-21 11:20:08.766467 osd.3
>> 10.10.10.8:6803/29916 131587 : [WRN] old request pg_log(0.ea epoch 849
>> query_epoch 843) v2 received at 2012-02-20 17:39:41.774507 currently started
>>
>> Any ideas how we can get the cluster back to consistent state  ?
>>
>> Thank you !!
>>
>> Jens
>
>
> --
> mit freundlichen Grüssen
>
> Jens Rehpöhler
>
> ----------------------------------------------------------------------
> Filoo GmbH
> Moltkestr. 25a
> 33330 Gütersloh
> HRB4355 AG Gütersloh
>
> Geschäftsführer: S.Grewing | J.Rehpöhler | Dr. C.Kunz
> Telefon: +49 5241 8673012 | Mobil: +49 151 54645798
> Hotline: 07000-3378658 (14 Ct/min) Fax: +49 5241 8673020
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2012-02-22 17:12 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-21 10:24 Problems after crash yesterday Jens Rehpöhler
2012-02-22  9:53 ` Jens Rehpöhler
2012-02-22 17:12   ` Gregory Farnum [this message]
2012-02-22 20:25     ` Jens Rehpöhler
2012-02-24  5:14       ` Gregory Farnum
2012-02-27 23:32         ` Gregory Farnum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=406144037926269578@unknownmsgid \
    --to=gregory.farnum@dreamhost.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=jens.rehpoehler@filoo.de \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.