All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Zafman <dzafman@redhat.com>
To: Vincent Godin <vince.mlist@gmail.com>,
	Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: Strange behavior of OSD after an IO error
Date: Wed, 27 Sep 2017 11:04:30 -0700	[thread overview]
Message-ID: <e65666dc-ee50-8d2b-daab-e9b247815c19@redhat.com> (raw)
In-Reply-To: <CAL2-6bLM8HMNkqPsF401TX2ppANBVtD_h40=kYg1uisAa3ruOw@mail.gmail.com>


The support for this was included in the Luminous release: 
https://github.com/ceph/ceph/pull/14760

David


On 9/27/17 7:37 AM, Vincent Godin wrote:
> I have a pretty big ceph cluster in jewel 10.2.7 (2 PB) and IO errors
> are common. I notice two different behaviors of OSDs when they
> encounter an IO error
>
> On JBOD mode, the OSD get the IO error and then abort ... The pgs are
> peering on others OSDs, the OSD restart, get it's pgs and everything
> is going as usual. Ceph health is OK and the degraded object is not
> declared to the cluster. So, if another read is needed on the same
> object, the process will restart (abort, perring, restart, peering).
> Only a deep-scrub will tell the cluster that one pg have a degraded
> object. Then a repair command will repair the pg (only a write
> operation can force a disk to reallocate a bad sector). Why the
> degraded object is not declared when the first read operation get the
> IO error ? If we have multiple IO errors on different OSDs, different
> read operations will make them flap till a deep-scrub ... The abort
> operation seems to be the current behavior according to the code !!!
> How the rbd_client deals with that (IO error or retry till the new
> peering of the pg ?)
>
> osd log
>
> IO error and abort
>
> 2017-09-27 00:55:52.100678 7faceba7b700 -1
> filestore(/var/lib/ceph/osd/ceph-276)
> FileStore::read(9.103_head/#9:c086eeb2:::rbd_data.6592c12eb141f2.0000000000058795:head#)
> pread error: (5) Input/output error
> 2017-09-27 00:55:52.128147 7faceba7b700 -1 os/filestore/FileStore.cc:
> In function 'virtual int FileStore::read(const coll_t&, const
> ghobject_t&, uint64_t, size_t, ceph::bufferlist&, uint32_t, bool) '
> thread 7faceba7b700 time 2017-09-27 00:55:52.101208
> os/filestore/FileStore.cc: 3016: FAILED assert(0 == "eio on pread")
>
>   ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)
>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0x7fad141219f5]
>   2: (FileStore::read(coll_t const&, ghobject_t const&, unsigned long,
> unsigned long, ceph::buffer::list&, unsigned int, bool)+0xd91)
> [0x7fad13df0661]
> ...
>
> then restart
>
> 2017-09-27 00:56:26.859560 7fd53b8b0800  0 osd.276 341224 load_pgs
> 2017-09-27 00:56:35.881155 7fd53b8b0800  0 osd.276 341224 load_pgs
> opened 167 pgs
> 2017-09-27 00:56:35.881531 7fd53b8b0800  0 osd.276 341224 using 0 op
> queue with priority op cut off at 64.
> 2017-09-27 00:56:35.883165 7fd53b8b0800 -1 osd.276 341224
> log_to_monitors {default=true}
> 2017-09-27 00:56:36.875207 7fd53b8b0800  0 osd.276 341224 done with
> init, starting boot process
>
> then deep-scrub (manually)
>
> 2017-09-27 10:34:22.481133 7fd50b196700  0 log_channel(cluster) log
> [INF] : 9.103 deep-scrub starts
> 2017-09-27 10:44:05.819995 7fd50b196700 -1 log_channel(cluster) log
> [ERR] : 9.103 shard 276: soid
> 9:c086eeb2:::rbd_data.6592c12eb141f2.0000000000058795:head candidate
> had a read error
> 2017-09-27 10:55:22.272655 7fd508991700 -1 log_channel(cluster) log
> [ERR] : 9.103 deep-scrub 0 missing, 1 inconsistent objects
> 2017-09-27 10:55:22.272663 7fd508991700 -1 log_channel(cluster) log
> [ERR] : 9.103 deep-scrub 1 errors
>
> then repair
>
> 2017-09-27 11:01:10.648292 7fd50b196700  0 log_channel(cluster) log
> [INF] : 9.103 repair starts
> 2017-09-27 11:08:57.329571 7fd50b196700 -1 log_channel(cluster) log
> [ERR] : 9.103 shard 276: soid
> 9:c086eeb2:::rbd_data.6592c12eb141f2.0000000000058795:head candidate
> had a read error
> 2017-09-27 11:18:42.498541 7fd50b196700 -1 log_channel(cluster) log
> [ERR] : 9.103 repair 0 missing, 1 inconsistent objects
> 2017-09-27 11:18:42.498730 7fd50b196700 -1 log_channel(cluster) log
> [ERR] : 9.103 repair 1 errors, 1 fixed
>
>
> The second behavior is more linked to the hardware and i just want to
> give a drawback of using raid0 disk
>
> On our cluster, whe used OSD hosts with JBOD and some olders with RAID0 disks
> On RAID0 hosts, we notice than when a IO error occured, the OSD never
> catch the information (it only appears in dmesg). On OSD log, we can
> only see the thread timing out and blocked requests. The OSD behaviour
> then depend of the load. Sometime it survive, sometime not but from a
> client side, i think it's an IO error for sure. It's probably linked
> to my hardware (SmartArray on HP SL4540) but maybe not ...
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


      reply	other threads:[~2017-09-27 18:04 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-27 14:37 Strange behavior of OSD after an IO error Vincent Godin
2017-09-27 18:04 ` David Zafman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e65666dc-ee50-8d2b-daab-e9b247815c19@redhat.com \
    --to=dzafman@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=vince.mlist@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.