All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yann Dupont <Yann.Dupont@univ-nantes.fr>
To: Samuel Just <sam.just@inktank.com>
Cc: Gregory Farnum <greg@inktank.com>,
	ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: domino-style OSD crash
Date: Tue, 10 Jul 2012 11:46:29 +0200	[thread overview]
Message-ID: <4FFBF9F5.9050000@univ-nantes.fr> (raw)
In-Reply-To: <CA+4uBUYGaUdYr97EqEGCLk9ByhUeXuqeeDyp2bzVRN7GeokePg@mail.gmail.com>

Le 09/07/2012 19:14, Samuel Just a écrit :
> Can you restart the node that failed to complete the upgrade with

Well, it's a little big complicated ; I now run those nodes with XFS, 
and I've long-running jobs on it right now, so I can't stop the ceph 
cluster at the moment.

As I've keeped the original broken btrfs volumes, I tried this morning 
to run the old osd in parrallel, using the $cluster variable. I only 
have partial success.
I tried using different port for the mons, but ceph want to use the old 
mon map. I can edit it (epoch 1) but it seems to use 'latest' instead, 
the format isn't compatible with monmaptool and I don't know how to 
"inject" the modified on a non running cluster.

Anyway, osd seems to start fine, and I can reproduce the bug :
> debug filestore = 20
> debug osd = 20
>

I've put it in [global], is it sufficient ?

>
> and post the log after an hour or so of running?  The upgrade process
> might legitimately take a while.
> -Sam
Only 15 minutes running, but ceph-osd is consumming lots of cpu, and a 
strace shows lots of pread.

Here is the log :

[..]
2012-07-10 11:33:29.560052 7f3e615ac780  0 
filestore(/CEPH-PROD/data/osd.1) mount syncfs(2) syscall not support by 
glibc
2012-07-10 11:33:29.560062 7f3e615ac780  0 
filestore(/CEPH-PROD/data/osd.1) mount no syncfs(2), but the btrfs SYNC 
ioctl will suffice
2012-07-10 11:33:29.560172 7f3e615ac780 -1 
filestore(/CEPH-PROD/data/osd.1) FileStore::mount : stale version stamp 
detected: 2. Proceeding, do_update is set, performing disk format upgrade.
2012-07-10 11:33:29.560233 7f3e615ac780  0 
filestore(/CEPH-PROD/data/osd.1) mount found snaps <3744666,3746725>
2012-07-10 11:33:29.560263 7f3e615ac780 10 
filestore(/CEPH-PROD/data/osd.1)  current/ seq was 3746725
2012-07-10 11:33:29.560267 7f3e615ac780 10 
filestore(/CEPH-PROD/data/osd.1)  most recent snap from 
<3744666,3746725> is 3746725
2012-07-10 11:33:29.560280 7f3e615ac780 10 
filestore(/CEPH-PROD/data/osd.1) mount rolling back to consistent snap 
3746725
2012-07-10 11:33:29.839281 7f3e615ac780  5 
filestore(/CEPH-PROD/data/osd.1) mount op_seq is 3746725


... and nothing more.

I'll let him running for 3 hours. If I have another message, I'll let 
you know.

Cheers,

-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-07-10  9:46 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-04  8:44 domino-style OSD crash Yann Dupont
2012-06-04 16:16 ` Tommi Virtanen
2012-06-04 17:40   ` Sam Just
2012-06-04 18:34     ` Greg Farnum
2012-07-03  8:40     ` Yann Dupont
2012-07-03 19:42       ` Tommi Virtanen
2012-07-03 20:54         ` Yann Dupont
2012-07-03 21:38           ` Tommi Virtanen
2012-07-04  8:06             ` Yann Dupont
2012-07-04 16:21               ` Gregory Farnum
2012-07-04 17:53                 ` Yann Dupont
2012-07-05 21:32                   ` Gregory Farnum
2012-07-06  7:19                     ` Yann Dupont
2012-07-06 17:01                       ` Gregory Farnum
2012-07-07  8:19                         ` Yann Dupont
2012-07-09 17:14                           ` Samuel Just
2012-07-10  9:46                             ` Yann Dupont [this message]
2012-07-10 15:56                               ` Tommi Virtanen
2012-07-10 16:39                                 ` Yann Dupont
2012-07-10 17:11                                   ` Tommi Virtanen
2012-07-10 17:36                                     ` Yann Dupont
2012-07-10 18:16                                       ` Tommi Virtanen
2012-07-09 17:43               ` Tommi Virtanen
2012-07-09 19:05                 ` Yann Dupont
2012-07-09 19:48                   ` Tommi Virtanen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FFBF9F5.9050000@univ-nantes.fr \
    --to=yann.dupont@univ-nantes.fr \
    --cc=ceph-devel@vger.kernel.org \
    --cc=greg@inktank.com \
    --cc=sam.just@inktank.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.