* Strange behavior of OSD after an IO error
@ 2017-09-27 14:37 Vincent Godin
2017-09-27 18:04 ` David Zafman
0 siblings, 1 reply; 2+ messages in thread
From: Vincent Godin @ 2017-09-27 14:37 UTC (permalink / raw)
To: Ceph Development
I have a pretty big ceph cluster in jewel 10.2.7 (2 PB) and IO errors
are common. I notice two different behaviors of OSDs when they
encounter an IO error
On JBOD mode, the OSD get the IO error and then abort ... The pgs are
peering on others OSDs, the OSD restart, get it's pgs and everything
is going as usual. Ceph health is OK and the degraded object is not
declared to the cluster. So, if another read is needed on the same
object, the process will restart (abort, perring, restart, peering).
Only a deep-scrub will tell the cluster that one pg have a degraded
object. Then a repair command will repair the pg (only a write
operation can force a disk to reallocate a bad sector). Why the
degraded object is not declared when the first read operation get the
IO error ? If we have multiple IO errors on different OSDs, different
read operations will make them flap till a deep-scrub ... The abort
operation seems to be the current behavior according to the code !!!
How the rbd_client deals with that (IO error or retry till the new
peering of the pg ?)
osd log
IO error and abort
2017-09-27 00:55:52.100678 7faceba7b700 -1
filestore(/var/lib/ceph/osd/ceph-276)
FileStore::read(9.103_head/#9:c086eeb2:::rbd_data.6592c12eb141f2.0000000000058795:head#)
pread error: (5) Input/output error
2017-09-27 00:55:52.128147 7faceba7b700 -1 os/filestore/FileStore.cc:
In function 'virtual int FileStore::read(const coll_t&, const
ghobject_t&, uint64_t, size_t, ceph::bufferlist&, uint32_t, bool) '
thread 7faceba7b700 time 2017-09-27 00:55:52.101208
os/filestore/FileStore.cc: 3016: FAILED assert(0 == "eio on pread")
ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0x7fad141219f5]
2: (FileStore::read(coll_t const&, ghobject_t const&, unsigned long,
unsigned long, ceph::buffer::list&, unsigned int, bool)+0xd91)
[0x7fad13df0661]
...
then restart
2017-09-27 00:56:26.859560 7fd53b8b0800 0 osd.276 341224 load_pgs
2017-09-27 00:56:35.881155 7fd53b8b0800 0 osd.276 341224 load_pgs
opened 167 pgs
2017-09-27 00:56:35.881531 7fd53b8b0800 0 osd.276 341224 using 0 op
queue with priority op cut off at 64.
2017-09-27 00:56:35.883165 7fd53b8b0800 -1 osd.276 341224
log_to_monitors {default=true}
2017-09-27 00:56:36.875207 7fd53b8b0800 0 osd.276 341224 done with
init, starting boot process
then deep-scrub (manually)
2017-09-27 10:34:22.481133 7fd50b196700 0 log_channel(cluster) log
[INF] : 9.103 deep-scrub starts
2017-09-27 10:44:05.819995 7fd50b196700 -1 log_channel(cluster) log
[ERR] : 9.103 shard 276: soid
9:c086eeb2:::rbd_data.6592c12eb141f2.0000000000058795:head candidate
had a read error
2017-09-27 10:55:22.272655 7fd508991700 -1 log_channel(cluster) log
[ERR] : 9.103 deep-scrub 0 missing, 1 inconsistent objects
2017-09-27 10:55:22.272663 7fd508991700 -1 log_channel(cluster) log
[ERR] : 9.103 deep-scrub 1 errors
then repair
2017-09-27 11:01:10.648292 7fd50b196700 0 log_channel(cluster) log
[INF] : 9.103 repair starts
2017-09-27 11:08:57.329571 7fd50b196700 -1 log_channel(cluster) log
[ERR] : 9.103 shard 276: soid
9:c086eeb2:::rbd_data.6592c12eb141f2.0000000000058795:head candidate
had a read error
2017-09-27 11:18:42.498541 7fd50b196700 -1 log_channel(cluster) log
[ERR] : 9.103 repair 0 missing, 1 inconsistent objects
2017-09-27 11:18:42.498730 7fd50b196700 -1 log_channel(cluster) log
[ERR] : 9.103 repair 1 errors, 1 fixed
The second behavior is more linked to the hardware and i just want to
give a drawback of using raid0 disk
On our cluster, whe used OSD hosts with JBOD and some olders with RAID0 disks
On RAID0 hosts, we notice than when a IO error occured, the OSD never
catch the information (it only appears in dmesg). On OSD log, we can
only see the thread timing out and blocked requests. The OSD behaviour
then depend of the load. Sometime it survive, sometime not but from a
client side, i think it's an IO error for sure. It's probably linked
to my hardware (SmartArray on HP SL4540) but maybe not ...
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Strange behavior of OSD after an IO error
2017-09-27 14:37 Strange behavior of OSD after an IO error Vincent Godin
@ 2017-09-27 18:04 ` David Zafman
0 siblings, 0 replies; 2+ messages in thread
From: David Zafman @ 2017-09-27 18:04 UTC (permalink / raw)
To: Vincent Godin, Ceph Development
The support for this was included in the Luminous release:
https://github.com/ceph/ceph/pull/14760
David
On 9/27/17 7:37 AM, Vincent Godin wrote:
> I have a pretty big ceph cluster in jewel 10.2.7 (2 PB) and IO errors
> are common. I notice two different behaviors of OSDs when they
> encounter an IO error
>
> On JBOD mode, the OSD get the IO error and then abort ... The pgs are
> peering on others OSDs, the OSD restart, get it's pgs and everything
> is going as usual. Ceph health is OK and the degraded object is not
> declared to the cluster. So, if another read is needed on the same
> object, the process will restart (abort, perring, restart, peering).
> Only a deep-scrub will tell the cluster that one pg have a degraded
> object. Then a repair command will repair the pg (only a write
> operation can force a disk to reallocate a bad sector). Why the
> degraded object is not declared when the first read operation get the
> IO error ? If we have multiple IO errors on different OSDs, different
> read operations will make them flap till a deep-scrub ... The abort
> operation seems to be the current behavior according to the code !!!
> How the rbd_client deals with that (IO error or retry till the new
> peering of the pg ?)
>
> osd log
>
> IO error and abort
>
> 2017-09-27 00:55:52.100678 7faceba7b700 -1
> filestore(/var/lib/ceph/osd/ceph-276)
> FileStore::read(9.103_head/#9:c086eeb2:::rbd_data.6592c12eb141f2.0000000000058795:head#)
> pread error: (5) Input/output error
> 2017-09-27 00:55:52.128147 7faceba7b700 -1 os/filestore/FileStore.cc:
> In function 'virtual int FileStore::read(const coll_t&, const
> ghobject_t&, uint64_t, size_t, ceph::bufferlist&, uint32_t, bool) '
> thread 7faceba7b700 time 2017-09-27 00:55:52.101208
> os/filestore/FileStore.cc: 3016: FAILED assert(0 == "eio on pread")
>
> ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0x7fad141219f5]
> 2: (FileStore::read(coll_t const&, ghobject_t const&, unsigned long,
> unsigned long, ceph::buffer::list&, unsigned int, bool)+0xd91)
> [0x7fad13df0661]
> ...
>
> then restart
>
> 2017-09-27 00:56:26.859560 7fd53b8b0800 0 osd.276 341224 load_pgs
> 2017-09-27 00:56:35.881155 7fd53b8b0800 0 osd.276 341224 load_pgs
> opened 167 pgs
> 2017-09-27 00:56:35.881531 7fd53b8b0800 0 osd.276 341224 using 0 op
> queue with priority op cut off at 64.
> 2017-09-27 00:56:35.883165 7fd53b8b0800 -1 osd.276 341224
> log_to_monitors {default=true}
> 2017-09-27 00:56:36.875207 7fd53b8b0800 0 osd.276 341224 done with
> init, starting boot process
>
> then deep-scrub (manually)
>
> 2017-09-27 10:34:22.481133 7fd50b196700 0 log_channel(cluster) log
> [INF] : 9.103 deep-scrub starts
> 2017-09-27 10:44:05.819995 7fd50b196700 -1 log_channel(cluster) log
> [ERR] : 9.103 shard 276: soid
> 9:c086eeb2:::rbd_data.6592c12eb141f2.0000000000058795:head candidate
> had a read error
> 2017-09-27 10:55:22.272655 7fd508991700 -1 log_channel(cluster) log
> [ERR] : 9.103 deep-scrub 0 missing, 1 inconsistent objects
> 2017-09-27 10:55:22.272663 7fd508991700 -1 log_channel(cluster) log
> [ERR] : 9.103 deep-scrub 1 errors
>
> then repair
>
> 2017-09-27 11:01:10.648292 7fd50b196700 0 log_channel(cluster) log
> [INF] : 9.103 repair starts
> 2017-09-27 11:08:57.329571 7fd50b196700 -1 log_channel(cluster) log
> [ERR] : 9.103 shard 276: soid
> 9:c086eeb2:::rbd_data.6592c12eb141f2.0000000000058795:head candidate
> had a read error
> 2017-09-27 11:18:42.498541 7fd50b196700 -1 log_channel(cluster) log
> [ERR] : 9.103 repair 0 missing, 1 inconsistent objects
> 2017-09-27 11:18:42.498730 7fd50b196700 -1 log_channel(cluster) log
> [ERR] : 9.103 repair 1 errors, 1 fixed
>
>
> The second behavior is more linked to the hardware and i just want to
> give a drawback of using raid0 disk
>
> On our cluster, whe used OSD hosts with JBOD and some olders with RAID0 disks
> On RAID0 hosts, we notice than when a IO error occured, the OSD never
> catch the information (it only appears in dmesg). On OSD log, we can
> only see the thread timing out and blocked requests. The OSD behaviour
> then depend of the load. Sometime it survive, sometime not but from a
> client side, i think it's an IO error for sure. It's probably linked
> to my hardware (SmartArray on HP SL4540) but maybe not ...
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-09-27 18:04 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-27 14:37 Strange behavior of OSD after an IO error Vincent Godin
2017-09-27 18:04 ` David Zafman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.