All of lore.kernel.org
 help / color / mirror / Atom feed
* Ceph data consistency
@ 2014-12-30  8:21 Paweł Sadowski
  2014-12-30 12:40 ` Vijayendra Shamanna
  0 siblings, 1 reply; 6+ messages in thread
From: Paweł Sadowski @ 2014-12-30  8:21 UTC (permalink / raw)
  To: ceph-devel

Hi,

On our Ceph cluster from time to time we have some inconsistent PGs
(after deep-scrub). We have some issues with disk/sata cables/lsi
controller causing IO errors from time to time (but that's not the point
in this case).

When IO error occurs on OSD journal partition everything works as is
should -> OSD is crashed and that's ok - Ceph will handle that.

But when IO error occurs on OSD data partition during journal flush OSD
continue to work. After calling *writev* (in buffer::list::write_fd) OSD
does check return code from this call but does NOT verify if write has
been successful to disk (data are still only in memory and there is no
fsync). That way OSD thinks that data has been stored on disk but it
might be discarded (during sync dirty page will be reclaimed and you'll
see "lost page write due to I/O error" in dmesg).

Since there is no checksumming of data I just wanted to make sure that
this is by design. Maybe there is a way to tell OSD to call fsync after
write and have data consistent?


-- 
Cheers,
PS

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-01-07  5:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-30  8:21 Ceph data consistency Paweł Sadowski
2014-12-30 12:40 ` Vijayendra Shamanna
2014-12-30 13:40   ` Paweł Sadowski
     [not found]     ` <CALurOm0wGEJ5MSrscUvVi_J3fyDevGbT3A11291qTLYTZejr_w@mail.gmail.com>
2015-01-07  1:41       ` Ma, Jianpeng
2015-01-07  2:18         ` Sage Weil
2015-01-07  5:59           ` Ma, Jianpeng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.