All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Łukasz Chrustek" <skidoo@tlen.pl>
To: Sage Weil <sage@newdream.net>
Cc: ceph-devel@vger.kernel.org
Subject: Re: Problem with query and any operation on PGs
Date: Wed, 24 May 2017 17:54:47 +0200	[thread overview]
Message-ID: <806057225.20170524175447@tlen.pl> (raw)
In-Reply-To: <alpine.DEB.2.11.1705241510290.3646@piezo.novalocal>

Hello,

> On Wed, 24 May 2017, Łukasz Chrustek wrote:

>> Hello,
>> 
>> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> Cześć,
>> >> 
>> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> >> Cześć,
>> >> >> 
>> >> >> > On Wed, 24 May 2017, Łukasz Chrustek wrote:
>> >> >> >> Cześć,
>> >> >> >> 
>> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> >> Cześć,
>> >> >> >> >> 
>> >> >> >> >> > On Tue, 23 May 2017, Łukasz Chrustek wrote:
>> >> >> >> >> >> I'm  not  sleeping for over 30 hours, and still can't find solution. I
>> >> >> >> >> >> did,      as      You      wrote,     but     turning     off     this
>> >> >> >> >> >> (https://pastebin.com/1npBXeMV) osds didn't resolve issue...
>> >> >> >> >> 
>> >> >> >> >> > The important bit is:
>> >> >> >> >> 
>> >> >> >> >> >             "blocked": "peering is blocked due to down osds",
>> >> >> >> >> >             "down_osds_we_would_probe": [
>> >> >> >> >> >                 6,
>> >> >> >> >> >                 10,
>> >> >> >> >> >                 33,
>> >> >> >> >> >                 37,
>> >> >> >> >> >                 72
>> >> >> >> >> >             ],
>> >> >> >> >> >             "peering_blocked_by": [
>> >> >> >> >> >                 {
>> >> >> >> >> >                     "osd": 6,
>> >> >> >> >> >                     "current_lost_at": 0,
>> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> >> >> >> >                 },
>> >> >> >> >> >                 {
>> >> >> >> >> >                     "osd": 10,
>> >> >> >> >> >                     "current_lost_at": 0,
>> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> >> >> >> >                 },
>> >> >> >> >> >                 {
>> >> >> >> >> >                     "osd": 37,
>> >> >> >> >> >                     "current_lost_at": 0,
>> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> >> >> >> >                 },
>> >> >> >> >> >                 {
>> >> >> >> >> >                     "osd": 72,
>> >> >> >> >> >                     "current_lost_at": 113771,
>> >> >> >> >> >                     "comment": "starting or marking this osd lost may let
>> >> >> >> >> > us proceed"
>> >> 
>> >> > These are the osds (6, 10, 37, 72).
>> >> 
>> >> >> >> >> >                 }
>> >> >> >> >> >             ]
>> >> >> >> >> >         },
>> >> >> >> >> 
>> >> >> >> >> > Are any of those OSDs startable?
>> >> 
>> >> > This
>> >> 
>> >> osd 6 - isn't startable
>> 
>> > Disk completely 100% dead, or just borken enough that ceph-osd won't 
>> > start?  ceph-objectstore-tool can be used to extract a copy of the 2 pgs
>> > from this osd to recover any important writes on that osd.
>> 
>> 2017-05-24 11:21:23.341938 7f6830a36940  0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process ceph-osd, pid 1375
>> 2017-05-24 11:21:23.350180 7f6830a36940  0 filestore(/var/lib/ceph/osd/ceph-6) backend btrfs (magic 0x9123683e)
>> 2017-05-24 11:21:23.350610 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: FIEMAP ioctl is supported and appears to work
>> 2017-05-24 11:21:23.350617 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
>> 2017-05-24 11:21:23.350633 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: splice is supported
>> 2017-05-24 11:21:23.351897 7f6830a36940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
>> 2017-05-24 11:21:23.351951 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: CLONE_RANGE ioctl is supported
>> 2017-05-24 11:21:23.351970 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to create simple subvolume test_subvol: (17) File exists
>> 2017-05-24 11:21:23.351981 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE is supported
>> 2017-05-24 11:21:23.351984 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
>> 2017-05-24 11:21:23.351987 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed with EPERM as non-root; remount with -o user_subvol_rm_allowed
>> 2017-05-24 11:21:23.351996 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: snaps enabled, but no SNAP_DESTROY ioctl; DISABLING
>> 2017-05-24 11:21:23.352573 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: START_SYNC is supported (transid 252877)
>> 2017-05-24 11:21:23.353001 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: WAIT_SYNC is supported
>> 2017-05-24 11:21:23.353012 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: removing old async_snap_test
>> 2017-05-24 11:21:23.353016 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove old async_snap_test: (1) Operation not permitted
>> 2017-05-24 11:21:23.353021 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_CREATE_V2 is supported
>> 2017-05-24 11:21:23.353022 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: SNAP_DESTROY failed: (1) Operation not permitted
>> 2017-05-24 11:21:23.353027 7f6830a36940  0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-6) detect_feature: failed to remove test_subvol: (1) Operation not permitted
>> 2017-05-24 11:21:23.355156 7f6830a36940  0 filestore(/var/lib/ceph/osd/ceph-6) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
>> 2017-05-24 11:21:23.355881 7f6830a36940 -1 filestore(/var/lib/ceph/osd/ceph-6) could not find -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
>> 2017-05-24 11:21:23.355891 7f6830a36940 -1 osd.6 0 OSD::init() : unable to read osd superblock
>> 2017-05-24 11:21:23.356411 7f6830a36940 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m
>> 
>> it is all I get for this osd in logs, when I try to start it.
>> 
>> >> osd 10, 37, 72 are startable
>> 
>> > With those started, I'd repeat the original sequence and get a fresh pg
>> > query to confirm that it still wants just osd.6.
>> 
>> You  mean about procedure with loop and taking down OSDs, which broken
>> PGs are pointing to ?
>> pg 1.60 is down+remapped+peering, acting [66,40]
>> pg 1.165 is down+peering, acting [67,88,48]
>> 
>> for pg 1.60 <--> 66 down, then in loop check pg query ?

> Right.

And  now  it  is very weird.... I made osd.37 up, and loop
while true;do; ceph tell 1.165 query ;done

catch this:

https://pastebin.com/zKu06fJn

Can You tell, what is wrong now ?

>> > use ceph-objectstore-tool to export the pg from osd.6, stop some other
>> > ranodm osd (not one of these ones), import the pg into that osd, and start
>> > again.  once it is up, 'ceph osd lost 6'.  the pg *should* peer at that
>> > point.  repeat with the same basic process with the other pg.
>> 
>> I have already did 'ceph osd lost 6', do I need to do this once again ?

> Hmm not sure, if the OSD is empty then there is no harm in doing it again.
> Try that first since it might resolve it.  If not, do the query loop 
> above.

> s



-- 
Regards,,
 Łukasz Chrustek


  parent reply	other threads:[~2017-05-24 15:54 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <175484591.20170523135449@tlen.pl>
2017-05-23 12:48 ` Problem with query and any operation on PGs Łukasz Chrustek
2017-05-23 14:17   ` Sage Weil
2017-05-23 14:43     ` Łukasz Chrustek
     [not found]     ` <1464688590.20170523185052@tlen.pl>
2017-05-23 17:40       ` Sage Weil
2017-05-23 21:43         ` Łukasz Chrustek
2017-05-23 21:48           ` Sage Weil
2017-05-24 13:19             ` Łukasz Chrustek
2017-05-24 13:37               ` Sage Weil
2017-05-24 13:58                 ` Łukasz Chrustek
2017-05-24 14:02                   ` Sage Weil
2017-05-24 14:18                     ` Łukasz Chrustek
2017-05-24 14:47                       ` Sage Weil
2017-05-24 15:00                         ` Łukasz Chrustek
2017-05-24 15:07                           ` Łukasz Chrustek
2017-05-24 15:11                           ` Sage Weil
2017-05-24 15:24                             ` Łukasz Chrustek
2017-05-24 15:54                             ` Łukasz Chrustek [this message]
2017-05-24 16:02                               ` Łukasz Chrustek
2017-05-24 17:07                                 ` Łukasz Chrustek
2017-05-24 17:16                                   ` Sage Weil
2017-05-24 17:28                                     ` Łukasz Chrustek
2017-05-24 18:16                                       ` Sage Weil
2017-05-24 19:47                                         ` Łukasz Chrustek
2017-05-24 17:30                                     ` Łukasz Chrustek
2017-05-24 17:35                                       ` Łukasz Chrustek
2017-05-24 21:38                         ` Łukasz Chrustek
2017-05-24 21:53                           ` Sage Weil
2017-05-24 22:09                             ` Łukasz Chrustek
2017-05-24 22:27                               ` Sage Weil
2017-05-24 22:46                                 ` Łukasz Chrustek
2017-05-25  2:06                                   ` Sage Weil
2017-05-25 11:22                                     ` Łukasz Chrustek
2017-05-29 15:31                                       ` Łukasz Chrustek
2017-05-30 13:21                                   ` Sage Weil
2017-06-10 22:45                                     ` Łukasz Chrustek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=806057225.20170524175447@tlen.pl \
    --to=skidoo@tlen.pl \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.