From: Wido den Hollander <wido@42on.com>
To: Samuel Just <sam.just@inktank.com>
Cc: ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: OSDs crashing with Operation Not Permitted on reading PGLog
Date: Mon, 27 Oct 2014 22:16:19 +0100 [thread overview]
Message-ID: <544EB623.8090205@42on.com> (raw)
In-Reply-To: <CA+4uBUaN8STAmfE7CkQvrBPK8UUGiybRbvgfMWbKp1784gVDaw@mail.gmail.com>
On 10/27/2014 10:05 PM, Samuel Just wrote:
> Try reproducing with an strace.
I did so and this is the result:
http://ceph.o.auroraobjects.eu/ceph-osd.25.strace.gz
It does this stat:
stat("/var/lib/ceph/osd/ceph-25/current/meta/DIR_D/DIR_C"
That fails with: -1 ENOENT (No such file or directory)
Afterwards it open this pglog:
/var/lib/ceph/osd/ceph-25/current/meta/DIR_D/pglog\\u14.1a56__0_A1630ECD__none
That file is however 0 bytes. (And all other files in the same directory).
Afterwards the OSD asserts and writes to the log.
Wido
> -Sam
>
> On Mon, Oct 27, 2014 at 2:02 PM, Wido den Hollander <wido@42on.com> wrote:
>> Hi,
>>
>> On a 0.80.7 cluster I'm experiencing a couple of OSDs refusing to start
>> due to a crash they encounter when reading the PGLog.
>>
>> A snippet of the log:
>>
>> -11> 2014-10-27 21:56:04.690046 7f034a006800 10
>> filestore(/var/lib/ceph/osd/ceph-25) _do_transaction on 0x392e600
>> -10> 2014-10-27 21:56:04.690078 7f034a006800 20
>> filestore(/var/lib/ceph/osd/ceph-25) _check_global_replay_guard no xattr
>> -9> 2014-10-27 21:56:04.690140 7f034a006800 20
>> filestore(/var/lib/ceph/osd/ceph-25) _check_replay_guard no xattr
>> -8> 2014-10-27 21:56:04.690150 7f034a006800 15
>> filestore(/var/lib/ceph/osd/ceph-25) touch meta/a1630ecd/pglog_14.1a56/0//-1
>> -7> 2014-10-27 21:56:04.690184 7f034a006800 10
>> filestore(/var/lib/ceph/osd/ceph-25) touch
>> meta/a1630ecd/pglog_14.1a56/0//-1 = 0
>> -6> 2014-10-27 21:56:04.690196 7f034a006800 15
>> filestore(/var/lib/ceph/osd/ceph-25) _omap_rmkeys
>> meta/a1630ecd/pglog_14.1a56/0//-1
>> -5> 2014-10-27 21:56:04.690290 7f034a006800 10 filestore oid:
>> a1630ecd/pglog_14.1a56/0//-1 not skipping op, *spos 1435883.0.2
>> -4> 2014-10-27 21:56:04.690295 7f034a006800 10 filestore >
>> header.spos 0.0.0
>> -3> 2014-10-27 21:56:04.690314 7f034a006800 0
>> filestore(/var/lib/ceph/osd/ceph-25) error (1) Operation not permitted
>> not handled on operation 33 (1435883.0.2, or op 2, counting from 0)
>> -2> 2014-10-27 21:56:04.690325 7f034a006800 0
>> filestore(/var/lib/ceph/osd/ceph-25) unexpected error code
>> -1> 2014-10-27 21:56:04.690327 7f034a006800 0
>> filestore(/var/lib/ceph/osd/ceph-25) transaction dump:
>> { "ops": [
>> { "op_num": 0,
>> "op_name": "nop"},
>> { "op_num": 1,
>> "op_name": "touch",
>> "collection": "meta",
>> "oid": "a1630ecd\/pglog_14.1a56\/0\/\/-1"},
>> { "op_num": 2,
>> "op_name": "omap_rmkeys",
>> "collection": "meta",
>> "oid": "a1630ecd\/pglog_14.1a56\/0\/\/-1"},
>> { "op_num": 3,
>> "op_name": "omap_setkeys",
>> "collection": "meta",
>> "oid": "a1630ecd\/pglog_14.1a56\/0\/\/-1",
>> "attr_lens": { "can_rollback_to": 12}}]}
>> 0> 2014-10-27 21:56:04.691992 7f034a006800 -1 os/FileStore.cc: In
>> function 'unsigned int
>> FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int,
>> ThreadPool::TPHandle*)' thread 7f034a006800 time 2014-10-27 21:56:04.690368
>> os/FileStore.cc: 2559: FAILED assert(0 == "unexpected error")
>>
>>
>> The backing XFS filesystem seems to be OK, but isn't this a leveldb
>> issue where the omap information is stored?
>>
>> Anyone seen this before? I have about 5 OSDs (out of the 336) which are
>> showing this problem when booting.
>>
>> --
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>>
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Wido den Hollander
42on B.V.
Ceph trainer and consultant
Phone: +31 (0)20 700 9902
Skype: contact42on
next prev parent reply other threads:[~2014-10-27 21:16 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-27 21:02 OSDs crashing with Operation Not Permitted on reading PGLog Wido den Hollander
2014-10-27 21:05 ` Samuel Just
2014-10-27 21:16 ` Wido den Hollander [this message]
2014-10-27 21:35 ` Samuel Just
2014-10-27 21:43 ` Wido den Hollander
2014-10-27 21:48 ` Samuel Just
2014-10-27 21:50 ` Wido den Hollander
2014-10-27 21:52 ` Samuel Just
2014-10-27 21:53 ` Wido den Hollander
2014-10-27 21:55 ` Samuel Just
2014-10-27 21:56 ` Wido den Hollander
2014-10-27 22:00 ` Samuel Just
2014-10-27 22:09 ` Wido den Hollander
2014-10-27 22:10 ` Samuel Just
2014-10-27 22:11 ` Samuel Just
2014-10-27 22:16 ` Samuel Just
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=544EB623.8090205@42on.com \
--to=wido@42on.com \
--cc=ceph-devel@vger.kernel.org \
--cc=sam.just@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.