From: "Paweł Sadowski" <ceph@sadziu.pl>
To: Vijayendra Shamanna <Vijayendra.Shamanna@sandisk.com>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: Ceph data consistency
Date: Tue, 30 Dec 2014 14:40:27 +0100 [thread overview]
Message-ID: <54A2AB4B.6000206@sadziu.pl> (raw)
In-Reply-To: <085BC98701B92B4C86AE7696562D284D0B9CE2F3@SACMBXIP03.sdcorp.global.sandisk.com>
On 12/30/2014 01:40 PM, Vijayendra Shamanna wrote:
> Hi,
>
> There is a sync thread (sync_entry in FileStore.cc) which triggers periodically and executes sync_filesystem() to ensure that the data is consistent. The journal entries are trimmed only after a successful sync_filesystem() call
sync_filesystem() always returns zero and journal will be trimmed. Executing sync()/syncfs() with dirty data in disk buffers will result in data loss ("lost page write due to I/O error").
I was doing some experiments simulating disk errors using Device Mapper "error" target. In this setup OSD was writing to broken disk without crashing. Every 5 seconds (filestore_max_sync_interval) kernel logs that some data were discarded due to IO error.
> Thanks
> Viju
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Pawel Sadowski
>> Sent: Tuesday, December 30, 2014 1:52 PM
>> To: ceph-devel@vger.kernel.org
>> Subject: Ceph data consistency
>>
>> Hi,
>>
>> On our Ceph cluster from time to time we have some inconsistent PGs (after deep-scrub). We have some issues with disk/sata cables/lsi controller causing IO errors from time to time (but that's not the point in this case).
>>
>> When IO error occurs on OSD journal partition everything works as is should -> OSD is crashed and that's ok - Ceph will handle that.
>>
>> But when IO error occurs on OSD data partition during journal flush OSD continue to work. After calling *writev* (in buffer::list::write_fd) OSD does check return code from this call but does NOT verify if write has been successful to disk (data are still only >in memory and there is no fsync). That way OSD thinks that data has been stored on disk but it might be discarded (during sync dirty page will be reclaimed and you'll see "lost page write due to I/O error" in dmesg).
>>
>> Since there is no checksumming of data I just wanted to make sure that this is by design. Maybe there is a way to tell OSD to call fsync after write and have data consistent?
--
PS
next prev parent reply other threads:[~2014-12-30 13:40 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-30 8:21 Ceph data consistency Paweł Sadowski
2014-12-30 12:40 ` Vijayendra Shamanna
2014-12-30 13:40 ` Paweł Sadowski [this message]
[not found] ` <CALurOm0wGEJ5MSrscUvVi_J3fyDevGbT3A11291qTLYTZejr_w@mail.gmail.com>
2015-01-07 1:41 ` Ma, Jianpeng
2015-01-07 2:18 ` Sage Weil
2015-01-07 5:59 ` Ma, Jianpeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54A2AB4B.6000206@sadziu.pl \
--to=ceph@sadziu.pl \
--cc=Vijayendra.Shamanna@sandisk.com \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.